Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Reasoning on GPQA Diamond (Acc, ARI, ABO, AIRW)

64Accuracy

Base Model

11.79225.34638.952.454May 5, 2026
Updated 28d ago

Evaluation Results

MethodLinks
2026.05
647,2036,7686,768
2026.05
57.112,93312,8349,468
2026.05
55.98,5218,1088,108
2026.05
51.413,96913,77313,773
2026.05
49.315,57215,7097,738
2026.05
46.111,09011,5136,370
2026.05
32.513,14813,3727,103
2026.05
26.114,81714,50214,502
2026.05
23.17,5248,5783,864
2026.05
1916,59716,33816,338
2026.05
17.219,37919,09419,094
2026.05
13.88,85410,1064,995