Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Reasoning on ARC (Pass@1)
Loading...
96.2
Pass@1
Qwen3
87.36
89.655
91.95
94.245
Oct 30, 2025
Pass@1
Updated 3d ago
Evaluation Results
Method
Method
Links
Pass@1
Qwen3
Base Model=8B, Trainin...
2025.10
96.2
GRPO
Base Model=8B, Trainin...
2025.10
95.8
GRPOExtraRollouts
Base Model=8B, Trainin...
2025.10
95.7
ICPO†
Base Model=8B, Trainin...
2025.10
95.6
ICPO
Base Model=8B, Trainin...
2025.10
95.5
GRPOExpertDomain
Base Model=8B, Trainin...
2025.10
92.2
GRPOExtraRollouts
Base Model=1.7B, Train...
2025.10
88.9
GRPOExpertDomain
Base Model=1.7B, Train...
2025.10
88.8
Qwen3
Base Model=1.7B, Train...
2025.10
88.3
GRPO
Base Model=1.7B, Train...
2025.10
88.3
ICPO
Base Model=1.7B, Train...
2025.10
88.1
ICPO†
Base Model=1.7B, Train...
2025.10
87.7
Feedback
Search any
task
Search any
task