Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Reasoning on ARC Challenge (test)
Loading...
95.7
Accuracy
EAGLE3
9.276
31.713
54.15
76.587
Feb 7, 2026
Accuracy
Tokens Processed
Latency
Updated 25d ago
Evaluation Results
Method
Method
Links
Accuracy
Tokens Processed
Latency
EAGLE3
Base Model=Qwen3-4B-Th...
2026.02
95.7
1,822
164.2
Think
Base Model=Qwen3-4B-Th...
2026.02
95.6
1,812
156.5
NoThink*
Base Model=Qwen3-4B-Th...
2026.02
95.1
1,889
159.8
DEER
Base Model=Qwen3-4B-Th...
2026.02
94.6
1,011
200.3
SpecExit
Base Model=Qwen3-4B-Th...
2026.02
94.5
588
71.4
EAGLE3
Base Model=DeepSeek-R1...
2026.02
59.2
1,378
496.4
SpecExit
Base Model=DeepSeek-R1...
2026.02
50.3
500
253.7
Vanilla
Base Model=DeepSeek-R1...
2026.02
49.9
1,917
628.5
DEER
Base Model=DeepSeek-R1...
2026.02
47.5
1,029
531.3
NoThink
Base Model=DeepSeek-R1...
2026.02
12.6
135
13.6
Feedback
Search any
task
Search any
task