Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Science Reasoning on GPQA (Accuracy, Token count)

69.2Accuracy

DTSR

52.35256.72661.165.474Apr 8, 2026
Updated 9d ago

Evaluation Results

MethodLinks
2026.04
69.25,677
2026.04
68.26,906
2026.04
67.26,395
2026.04
66.75,436
2026.04
66.25,916
2026.04
65.77,513
2026.04
64.16,775
2026.04
62.67,105
2026.04
60.18,861
2026.04
59.67,942
2026.04
57.18,306
2026.04
55.66,920
2026.04
55.61,286
2026.04
54.6561
2026.04
531,271