Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on TheoremQA

43.1Accuracy

Qwen2-72B

4.41214.45624.534.544Jun 18, 2024Aug 2, 2024Sep 17, 2024Nov 2, 2024Dec 18, 2024Feb 2, 2025Mar 20, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2024.07
43.1
35.9
2024.06
35.9
34.9
2024.07
33.5
2024.06
32.5
32.3
2025.03
32.2
2024.06
32.2
29.3
2024.07
28.8
2025.03
28.1
2025.03
27.4
2025.03
27.2
2025.03
27
2025.03
24.6
2025.03
23.8
2025.03
23.4
2024.07
23.2
2025.03
22.9
2025.03
22.8
2025.03
20
2025.03
19.4
2025.03
19
2025.03
18.9
2025.03
18.9
2025.03
18.2
2025.03
18.1
2025.03
17
2025.03
17
2025.03
16.6
2025.03
16.2
2025.03
16.2
2025.03
16.2
2025.03
16.1
2025.03
15.5
2025.03
15.5
2025.03
15.2
2025.03
14.9
2025.03
14.4
2025.03
14
2025.03
14
2025.03
13.8
2025.03
13.6
2025.03
13
2025.03
12.8
2025.03
12.2
2025.03
11.1
2025.03
11
2025.03
11
2025.03
10.9
2025.03
10.6
2025.03
7.6
2025.03
6.8
2025.03
5.9