Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Knowledge-intensive Reasoning on GPQA ambiguity-augmented
Loading...
42.8
Accuracy
DisambiguSLM
36.248
37.949
39.65
41.351
Apr 25, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
DisambiguSLM
2026.04
42.8
SPO
2026.04
41.1
OPRO
2026.04
40.3
Step-back
2026.04
39.7
PromptAgent
2026.04
39.2
CoT
2026.04
38.9
TextGrad
2026.04
38.9
APE
2026.04
38.8
PromptBreeder
2026.04
38.5
Rephrase
2026.04
37.1
Naïve prompting
2026.04
36.5
Feedback
Search any
task
Search any
task