Share your thoughts, 1 month free Claude Pro on usSee more

benchmark

Benchmarks

Task Name	Dataset Name	SOTA Result
Tabular Data Synthesis	Small Benchmark	Shape12.407	13
Embodied vision-language reasoning	Original benchmark B	Score61.26	13
Camera Pose Estimation	Zero-shot cross-domain benchmark (test)	Mean5.94	12
Style Transfer	benchmark (test)	Style Similarity CSD Score0.561	9
Video Customization	70-example benchmark 1.0 (test)	FaceSim Arc0.59	9
Class-conditional video generation	Benchmark 17x256x256 resolution (test)	gFVD210.9	9
Placement	100-task benchmark (test)	PE (cm)0	8
Survival Prediction	33-task benchmark Survival prediction	C-index58	8
Theorem Proving	Small-scale benchmark Overall	VR33	8
Text-driven Style Transfer	Benchmark of 52 prompts and 20 style images 1.0 (test)	Text Alignment0.235	8
Intent Classification	Benchmark 03	In-Scope Accuracy84	8
Reward prediction	10-task benchmark Overall	Demo L (MSE)0.02	7
Waypoint-hold under directionally varied push	Benchmark II Waypoint-Hold Under Directionally Varied Push	WP Success Count3	7
Assistive Vision-Language Tasks	benchmark local images disk-replay protocol (n=120) v6	TTFT518	6
Robot Learning	14-task benchmark	Final SR95.6	6
Educational Video Generation	200-task benchmark	Success Rate599	6
Commonsense Triple Validation	Benchmark ¬ATOMIC	Valid Precision83	6
Image Classification	8-task benchmark	ID Score94.8	6
Robot parameter extraction and forward kinematics calculation	Benchmark 1 (test)	M_C (Completeness/Score)97	6
3D face reconstruction	benchmark High-Quality (HQ) 1.0	Median Error (mm)1.58	6
Audit-relevance Classification	benchmark_1000 n=993 (train)	Accuracy79.6	5
ODE Discovery	Benchmark 2	Model Complexity17.2	5
Classification	Benchmark (BM) 10 clients, pathological non-IID	AUC-ROC91	5
Electric Vehicle Routing Problem (ECVRP)	benchmark Small Instances	Objective Value263.33	5
Speculative Decoding	Benchmark Second Turn	Block Efficiency2.32	5

Showing 25 of 40 rows