Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Math & Code Reasoning on ARC Easy
Loading...
47
Score
DiReCT
26.408
31.754
37.1
42.446
May 29, 2026
Score
Updated 2d ago
Evaluation Results
Method
Method
Links
Score
DiReCT
Backbone=Llama-1.1B
2026.05
47
GradNorm (IS)
Backbone=Llama-1.1B
2026.05
44.3
InfoBatch
Backbone=Llama-1.1B
2026.05
43.7
Perplexity-based
Backbone=Llama-1.1B
2026.05
42.4
Uniform Sampling
Backbone=Llama-1.1B
2026.05
41.3
Loss-based
Backbone=Llama-1.1B
2026.05
41
DiReCT
Backbone=GPT-2-Medium...
2026.05
34
InfoBatch
Backbone=GPT-2-Medium...
2026.05
31.7
GradNorm (IS)
Backbone=GPT-2-Medium...
2026.05
30.6
Perplexity-based
Backbone=GPT-2-Medium...
2026.05
29.8
Loss-based
Backbone=GPT-2-Medium...
2026.05
28.4
Uniform Sampling
Backbone=GPT-2-Medium...
2026.05
27.2
Feedback
Search any
task
Search any
task