Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Reasoning on Average
Loading...
54.1
Avg @1 Score
EXPERT
33.716
39.008
44.3
49.592
Apr 1, 2026
Avg @1 Score
Updated 16d ago
Evaluation Results
Method
Method
Links
Avg @1 Score
EXPERT
Base Model=OLMo-3-7B,...
2026.04
54.1
ACTMat
Base Model=OLMo-3-7B,...
2026.04
45.9
REGMEAN
Base Model=OLMo-3-7B,...
2026.04
45.7
TSV
Base Model=OLMo-3-7B,...
2026.04
43.6
TA
Base Model=OLMo-3-7B,...
2026.04
41.9
AVERAGE
Base Model=OLMo-3-7B,...
2026.04
41.8
ISO-C
Base Model=OLMo-3-7B,...
2026.04
40.1
ZERO-SHOT
Base Model=OLMo-3-7B,...
2026.04
34.5
Feedback
Search any
task
Search any
task