Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Science QA on ARC easy
Loading...
77.9
Accuracy
Vanilla
31.516
43.558
55.6
67.642
May 28, 2026
May 29, 2026
May 30, 2026
May 31, 2026
Jun 1, 2026
Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
Vanilla
Backbone=Mistral-7B-In...
2026.05
77.9
DT-CD⋆
Backbone=Mistral-7B-In...
2026.05
77.8
COFT
Backbone=Mistral-7B-In...
2026.05
77.8
SDD
Backbone=Mistral-7B-In...
2026.05
77.4
DExperts
Backbone=Mistral-7B-In...
2026.05
77.2
Vanilla
Backbone=LLaMA-2-13B
2026.05
74.6
COFT
Backbone=LLaMA-2-13B
2026.05
74.5
DT-CD⋆
Backbone=LLaMA-2-13B
2026.05
74.4
SDD
Backbone=LLaMA-2-13B
2026.05
74
DExperts
Backbone=LLaMA-2-13B
2026.05
73.7
Mamba-2
model size=152M, mb=4
2026.06
36.4
SISA
model size=152M, ds=32...
2026.06
35.8
SISA
model size=152M, ds=64...
2026.06
35.4
SISA
model size=152M, ds=12...
2026.06
35
Mamba-3
model size=152M, mb=4
2026.06
34.9
SISA
model size=152M, ds=16...
2026.06
34.7
Transformer
model size=152M, mb=4
2026.06
33.3
Feedback
Search any
task
Search any
task