Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
3D Logic on Super-CLEVR
Loading...
97
Pass@1
MIG
43.96
57.73
71.5
85.27
Feb 1, 2026
Pass@1
Pass@8
Delta Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@8
Delta Pass@1
MIG
evaluation_mode=In-Dom...
2026.02
97
99
8
GRPO
evaluation_mode=In-Dom...
2026.02
89
97
-
Base Model
evaluation_mode=In-Dom...
2026.02
46
80
-
Feedback
Search any
task
Search any
task