Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Science Reasoning on GPQA (test)

64.44Accuracy

InjectRLOpt

11.628825.339439.0552.7606May 6, 2024Sep 5, 2024Jan 6, 2025May 9, 2025Sep 9, 2025Jan 10, 2026May 13, 2026
Updated 20d ago

Evaluation Results

MethodLinks
2026.02
64.44
2026.02
63.33
2026.02
62.32
2026.02
60.61
2026.02
60.51
2026.02
60.4
2026.02
59.49
2026.02
58.69
2026.02
57.07
2026.02
54.14
2026.02
53.23
2026.02
52.22
2026.05
40.4
2026.05
39.39
2026.05
38.89
2026.05
38.38
2026.05
38.38
2026.05
37.88
2024.05
37.8
2026.05
37.37
2026.05
36.87
2024.05
36.8
2024.05
36.8
2025.04
34.82
2026.05
34.57
2026.05
31.82
2026.05
31.82
2024.05
31.8
2026.05
31.4
2026.05
30.81
2025.04
30.35
2025.04
30.13
2025.04
29.91
2025.04
29.69
2026.05
29.29
2025.04
28.79
2026.05
28.79
2025.04
28.57
2025.04
28.57
2025.04
28.35
2025.04
28.13
2025.04
26.79
2025.04
26.56
2025.04
23.44
2025.04
22.99
2026.01
21.68
2026.01
20.54
2025.04
20.31
2026.01
17.95
2026.01
16.84
2026.01
16.13
2026.01
15.14
2026.01
14.73
2026.01
14.54
2026.01
13.96
2026.01
13.66