Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VLABench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Success Rate EvaluationVLABench
Average Success Rate46.3
19
Failure DetectionVLABench (Unseen Tasks)
bACC71.3
12
Failure DetectionVLABench (Seen Tasks)
Balanced Accuracy (bACC)85.6
12
Robot ManipulationVLABench
Toy Success Rate70
5
Language-conditioned visual reasoningVLABench official (test)
Precision Score (Toy)76
4
Robotic Task PlanningVLABench
Toy Success Rate54
4
Language-conditioned visual reasoningVLABench
SR (Toy)54
4
Robotic Manipulation ReasoningVLABench
In Distribution Accuracy54
3
Robotic ManipulationVLABench 5 public tracks v1.0
IS (In-dist)79.8
3
Robot ManipulationVLABench Cross Category
Add Condiment Success Rate14
2
Robot ManipulationVLABench In Distribution
Add Condiment Success63
2
Showing 11 of 11 rows