Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on GPQA (Accuracy, Loss)

32.8Accuracy

AdamW

25.384827.309929.23531.1601Sep 2, 2025Oct 8, 2025Nov 14, 2025Dec 21, 2025Jan 26, 2026Mar 4, 2026Apr 10, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.04
32.82.005
2025.09
32.37-
2026.04
321.968
2025.09
30.8-
2025.09
30.8-
2025.09
30.58-
2025.09
30.36-
2025.09
30.13-
2025.09
29.91-
2026.04
29.61.981
2025.09
29.24-
2025.09
29.02-
2025.09
28.79-
2025.09
28.79-
2025.09
28.79-
2025.09
28.35-
2025.09
28.12-
2025.09
27.9-
2025.09
27.79-
2025.09
26.34-
2025.09
25.67-