Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Knowledge Evaluation on GPQA

90.4Accuracy

CoT2-Meta

13.75233.65153.5573.449Nov 28, 2025Dec 18, 2025Jan 7, 2026Jan 28, 2026Feb 17, 2026Mar 9, 2026Mar 30, 2026
Updated 18d ago

Evaluation Results

MethodLinks
2026.03
90.4
2026.03
85.2
2026.03
83.5
2026.03
79.6
2026.03
74.2
2026.01
59.39
2026.01
56.5
2026.01
56.25
2026.02
55.13
2026.01
50.9
2026.01
47.51
2026.01
46.97
2026.02
45
44.87
2026.01
42.42
2026.01
41.67
2026.01
41.06
2026.01
40.1
2026.01
39.38
2026.01
38.64
2026.01
38.64
2026.01
36.36
2026.01
36.36
2025.11
34.85
2026.02
34.3
2026.01
34.09
2026.02
33.8
2026.01
33.33
2026.01
33.33
2026.01
31.9
2026.01
31.7
2025.11
30.6
2025.11
29.51
2025.11
29.29
29.2
2026.02
28.8
2026.02
28.8
2026.02
27.3
2026.02
27.2
2026.02
27.2
2025.11
26.57
2025.11
26.46
2025.11
26.31
2026.01
25.7
2026.01
24.24
2026.02
24.2
2026.01
23.48
2026.02
21.7
2026.02
20.7
2026.02
16.7
2026.02
16.7