Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on Olympiad Bench (Accuracy, Generation Length)

85.2Accuracy

BCR-Qwen3-4B (Ours)

8.676828.543448.4168.2766Jun 3, 2025Jul 24, 2025Sep 13, 2025Nov 4, 2025Dec 25, 2025Feb 14, 2026Apr 7, 2026
Updated 9d ago

Evaluation Results

MethodLinks
2026.04
85.210,717
2026.04
83.313,069
2025.06
69.3-
2026.04
67.39,395
2026.04
67.17,515
2025.06
66.4-
2026.04
62.92,969
2025.06
62.7-
2026.04
62.17,017
2026.04
61.54,352
2026.03
58.955,501
2026.03
57.927,372
2026.03
57.676,262
2026.03
57.275,721
2026.03
56.417,347
2026.03
55.75,299
2026.03
55.395,885
2026.04
55.29,755
2026.03
54.234,920
2026.03
54.034,419
2026.03
53.646,173
2026.04
53.55,765.1
2026.04
53.411,599
2026.04
51.75,406.2
2026.04
49.95,051.1
2026.04
49.85,318.2
2026.04
49.65,190.8
2026.04
471,885.5
2026.04
46.52,145.8
2026.04
45.94,871
2026.04
45.62,218
2026.04
44.93,140.4
2026.04
44.73,227.1
2026.01
41.66,164
2026.01
40.92,398
2026.04
40.92,349.8
2026.01
402,068
2026.01
39.92,915
2026.01
39.43,452
2026.04
39.42,427.1
2026.04
393,651.2
2026.04
38.71,120.2
2026.04
37.31,053.8
2026.04
36.43,057.5
2026.04
36.33,140.1
2026.01
361,659
2026.01
361,714
2026.04
355,814.2
2026.04
34.87,501.5
2026.04
34.82,038.5
2026.04
33.63,828.8
2026.04
33.52,539.7
2026.04
32.72,698.9
2026.04
31.94,534
2026.04
31.15,186.3
2026.01
30.47,389
2026.04
29.81,235.1
2026.01
29.2898
2026.04
28.41,258.4
2026.01
27987
2026.04
25.2813
2026.01
22.8608
2026.04
21.31,310.9
2026.02
14.4414,326
2026.02
13.6314,791
2026.02
13.6313,956
2026.02
13.2515,527
2026.02
12.8114,410
2026.02
12.6214,547
2026.02
12.6214,189
2026.02
12.5615,533
2026.04
11.91,337.7
2026.02
11.6212,908