DealForge autonomously sources, scores, and writes investment memos on venture deals. Stop manually hunting.

1,180+ deals tracked  ·  22 AI investment memos  ·  Updated daily

← Back to leaderboard

PhAIL

Show HN: PhAIL – Real-robot benchmark for AI models

68 AI Score
Show_hn other Added Apr 1, 2026

Details

Sector
other
Total Funding
$0
Last Round
$0

About

I built this because I couldn&#x27;t find honest numbers on how well VLA models [1] actually work on commercial tasks. I come from search ranking at Google where you measure everything, and in robotics nobody seemed to know.<p>PhAIL runs four models (OpenPI&#x2F;pi0.5, GR00T, ACT, SmolVLA) on bin-to-bin order picking – one of the most common warehouse operations. Same robot (Franka FR3), same objects, hundreds of blind runs. The operator doesn&#x27;t know which model is running.<p>Best model: 64 UPH. Human teleoperating the same robot: 330. Human by hand: 1,300+.<p>Everything is public – every run with synced video and telemetry, the fine-tuning dataset, training scripts. The leaderboard is open for submissions.<p>Happy to answer questions about methodology, the models, or what we observed.<p>[1] Vision-Language-Action: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Vision-language-action_model" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Vision-language-action_model</a>

AI Score Reasoning

PhAIL addresses a critical bottleneck in the robotics industry—the lack of standardized, objective benchmarking for VLA models—led by a founder with a high-signal background in Google Search ranking. While currently an early-stage project with modest traction, its potential to become the 'trust layer' for autonomous warehouse operations offers significant upside if it can achieve industry-wide adoption.

Source

Show_hn — View original →