Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning on MMLU-R

84.4Accuracy (MMLU-R General Reasoning)

ROSA2

-3.30039219.46797942.2363565.004721Sep 27, 2025Oct 23, 2025Nov 18, 2025Dec 14, 2025Jan 9, 2026Feb 4, 2026Mar 2, 2026
Updated 2d ago

Evaluation Results

MethodLinks
2026.03
84.4-
2026.03
75.8-
2026.03
70.6-
2026.03
63-
2026.03
60.2-
2026.03
60.2-
2026.03
59.4-
2026.03
57-
2026.03
50-
2026.03
48.8-
2026.03
46.4-
2026.03
43.4-
2026.03
42.8-
2026.03
39.8-
2026.03
24-
2026.03
23.6-
2026.03
18.4-
2026.03
12.4-
2026.03
11.4-
2026.03
9.4-
2025.09
0.7036-
2025.09
0.6847-
2025.09
0.6837-
2025.09
0.6731-
2025.09
0.6727-
2025.09
0.6217-
2025.09
0.5135-
2025.09
0.4579-
2025.09
0.4536-
2025.09
0.4218-
2025.09
0.4114-
2025.09
0.4068-
2025.09
0.36-
2025.09
0.334-
2025.09
0.3046-
2025.09
0.186-
2025.09
0.1372-
2025.09
0.11-
2025.09
0.0907-
2025.09
0.0727-
2026.05
-76.8
2026.05
-77
2026.05
-78.8
2026.05
-78.2
2026.05
-84.4
2026.05
-85.4
2026.05
-87
2026.05
-84.6
2026.05
-84.5
2026.05
-85.3
2026.05
-87
2026.05
-85.4
2026.05
-86.1
2026.05
-86.6
2026.05
-90.9
2026.05
-87.6