Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Open-form mathematical reasoning on DeepMind MATHEMATICS (Exact-match accuracy)

42.13Exact-match Accuracy

TRIM

6.946816.080925.21534.3491Oct 8, 2025
Updated 19d ago

Evaluation Results

MethodLinks
2025.10
42.13
2025.10
42.1
2025.10
41.92
2025.10
41.8
2025.10
40.84
2025.10
38.75
2025.10
8.3