Nebius Build Berlin · April 28, 2026
The VLA Reality Check
PhAIL — the Physical AI Leaderboard. Real hardware. Real metrics.
Sergey Arkhangelsky
Founder, Positronic Robotics · Joint with Nebius
phail.ai ↗
Is today's model better than yesterday's?
Are we actually making progress?
The problem
Answering honestly is harder than it sounds.
Four methodological traps make most VLA comparisons misleading.
01
Operator and environment shift outcomes
02
Different models speak different languages
03
One metric isn't enough — speed, reliability, failure modes
04
10 runs don't prove anything
The response
What honest eval looks like.
Four principles. Each one a direct answer to a trap above.
Same-session, blinded A/B
→
No drift, no bias
One inference API
→
Apples-to-apples
Full data
→
Any metric you want
Enough rollouts
→
Signal, not noise
Headline results
Where VLAs are in April 2026.
Three numbers from running four open-source VLAs on the same rig.
5%
of human throughput
best model, pick-and-place
~4 min
between human assists
mean time between assists
−22 pp
GR00T loses when the camera is occluded. OpenPI loses just 6.
robustness is not equal
What's next
PhAIL in the coming months.
Trossen bimanual
New embodiment landing on the rig.
More tasks
Custom evaluation tasks on request.
Get your model on the board
Free Nebius credits to fine-tune & submit. Talk to us after the talk.
So, are we making progress?
Now we can answer.
— and you can, too.
phail.ai ↗Sergey Arkhangelsky · Positronic Robotics · Nebius Build Berlin 2026