Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Science Question Answering on GPQA (accuracy)

63.51Accuracy

SFT

2.763618.534334.30550.0757Apr 23, 2025Jun 22, 2025Aug 21, 2025Oct 20, 2025Dec 19, 2025Feb 17, 2026Apr 18, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
63.51
2026.04
61.47
2026.04
58.46
2026.04
56.94
2026.03
54.5
2026.04
53.27
2026.04
51.92
2026.04
50.43
2026.03
49
2026.03
48
2026.03
48
2026.03
47
2025.04
46.7
2026.04
45.2
2026.03
45
2026.04
44.52
2026.04
44.34
2026.03
43
2026.04
42.74
2025.04
42.2
2025.04
40.6
2026.03
40
2026.04
39.43
2026.04
39.04
2026.02
38.4
2026.04
37.82
2026.03
37
2026.03
37
2026.04
36.36
2026.02
36.1
2026.03
36
2026.04
35.98
2026.04
35.35
2025.09
34.97
2026.04
34.93
2025.09
33.33
2025.04
32.8
2026.04
32.32
2026.03
32
2025.09
31.52
2026.04
31.19
2026.03
31
2025.09
30.93
2026.04
30.72
2025.04
30.6
2026.04
30.56
2026.02
30.5
2026.04
30.39
2026.04
30.32
2026.04
30.18
2025.09
30.01
2025.04
29.7
2025.09
29.27
2025.09
28.28
2025.09
27.52
2025.09
27.02
2026.02
26.4
2025.09
26.26
2026.02
26.2
2026.02
25.3
2025.09
25.25
2026.02
24.9
2025.09
24.75
2025.04
22.8
2026.04
22.22
2026.04
20.13
2025.04
14.3
2025.04
6.3
2025.04
5.1