Share your thoughts, 1 month free Claude Pro on usSee more

VLABench

Benchmarks

Task Name	Dataset Name	SOTA Result
Language-conditioned Manipulation Planning	VLABench	Success Rate64.83	35
High-level planning	VLABench vlm_evaluation 1.0 (test)	AUC85.8	25
Success Rate Evaluation	VLABench	Average Success Rate46.3	19
Failure Detection	VLABench (Unseen Tasks)	bACC71.3	12
Failure Detection	VLABench (Seen Tasks)	Balanced Accuracy (bACC)85.6	12
Embodied Task Execution	VLABench Avg.	Progress Score (PS)65.9	10
Embodied Task Execution	VLABench Instruction	Progress Score (PS)70.2	10
Embodied Task Execution	VLABench Commonsense	Progress Score (PS)57.3	10
Embodied Task Execution	VLABench Cross Category	Primitive Success (PS)61	10
Embodied Task Execution	VLABench In-dist.	Progress Score (PS)81.1	10
Embodied Task Execution	VLABench Texture	Progress Score (PS)62.3	9
Robotic Manipulation Reasoning	VLABench	In Distribution Accuracy78	9
Robotic Manipulation	VLABench seen object configurations π0.5	Success Rate38.8	8
Robot Action Prediction	VLABench T2	Success Rate39.9	8
Robot Action Prediction	VLABench T1	Success Rate (SR)64.8	8
Robotic Manipulation	VLABench GR00T-N1.6 (seen object configurations)	Success Rate35.7	5
Robotic Manipulation	VLABench Average (test)	IS71.2	5
Robotic Manipulation	VLABench Unseen Texture (test)	IS76	5
Robotic Manipulation	VLABench Semantic Instruction (test)	IS Score82	5
Robotic Manipulation	VLABench Common Sense (test)	IS74.2	5
Robotic Manipulation	VLABench Cross-Category (test)	IS54.1	5
Robotic Manipulation	VLABench Real-world (Average)	Success Rate (SR)55	5
Robotic Manipulation	VLABench Real-world (Long Horizon Tier)	Success Rate (SR)38	5
Robotic Manipulation	VLABench Real-world (Semantic Tier)	Success Rate (SR)42	5
Robotic Manipulation	VLABench Real-world (Basic Tier)	Success Rate (SR)97	5

Showing 25 of 42 rows