Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mean

Benchmarks

Task NameDataset NameSOTA ResultTrend
Pixel-level manipulation detectionMEAN Across datasets
F1 Score72.8
20
Code GenerationMean Across MBPP, CodeAlpacaPy, HumanEval, LiveCodeBench
Speedup4.04
14
Pick-and-placeMean Across T1, T2, T3
Mean Grasp Success Rate99
10
Offline Reinforcement LearningMean Medium-Replay
Normalized Return76.45
7
Offline Reinforcement LearningMean Medium
Normalized Return71.33
7
Offline Reinforcement LearningMean Medium-Expert
Normalized Return98.5
7
Physically-based renderingMean All scenes
PSNR31.8
4
Mathematical ReasoningMean across benchmarks
Speedup2.12
2
Showing 8 of 8 rows