Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning on MMLU-R

84.4Accuracy (MMLU-R General Reasoning)

ROSA2

-3.30039219.46797942.2363565.004721Sep 27, 2025Oct 23, 2025Nov 18, 2025Dec 14, 2025Jan 9, 2026Feb 4, 2026Mar 2, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
84.4
2026.03
75.8
2026.03
70.6
2026.03
63
2026.03
60.2
2026.03
60.2
2026.03
59.4
2026.03
57
2026.03
50
2026.03
48.8
2026.03
46.4
2026.03
43.4
2026.03
42.8
2026.03
39.8
2026.03
24
2026.03
23.6
2026.03
18.4
2026.03
12.4
2026.03
11.4
2026.03
9.4
2025.09
0.7036
2025.09
0.6847
2025.09
0.6837
2025.09
0.6731
2025.09
0.6727
2025.09
0.6217
2025.09
0.5135
2025.09
0.4579
2025.09
0.4536
2025.09
0.4218
2025.09
0.4114
2025.09
0.4068
2025.09
0.36
2025.09
0.334
2025.09
0.3046
2025.09
0.186
2025.09
0.1372
2025.09
0.11
2025.09
0.0907
2025.09
0.0727