Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Knowledge-intensive Reasoning on GPQA ambiguity-augmented

42.8Accuracy

DisambiguSLM

36.24837.94939.6541.351Apr 25, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
42.8
2026.04
41.1
2026.04
40.3
2026.04
39.7
39.2
2026.04
38.9
2026.04
38.9
2026.04
38.8
38.5
2026.04
37.1
36.5