Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Question Answering on GPQA Diamond (ACC, TOK, η)

53.03Accuracy (ACC)

Qwen3-4B-Thinking-2507 + DEER

36.639640.894845.1549.4052May 12, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
53.034,7621.414
2026.05
52.533,8351.739
2026.05
51.016,4751
2026.05
50.15,9861.062
2026.05
504,2371.498
2026.05
49.494,7051.335
2026.05
48.997,5680.822
2026.05
48.485,9611.032
2026.05
47.584,2941
2026.05
46.572,0232.078
2026.05
45.862,6431.566
2026.05
45.663,8911.059
2026.05
45.456,4200.899
2026.05
43.333,9481.22
2026.05
42.323,7731.247
2026.05
41.824,3511.068
2026.05
41.314,5921
2026.05
40.34,0121.117
2026.05
39.81,9872.227
2026.05
39.32,8161.551
2026.05
38.284,5400.937
2026.05
37.273,8891.065