Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Linguistic Reasoning on BigBench Hard Disambiguation QA
Loading...
55.1
Accuracy
ReElicit
51.252
52.251
53.25
54.249
May 18, 2026
Accuracy
Updated 14d ago
Evaluation Results
Method
Method
Links
Accuracy
ReElicit
evaluations=30 prompt...
2026.05
55.1
TextGrad
evaluations=30 prompt...
2026.05
53.2
OPRO
evaluations=30 prompt...
2026.05
52.4
PromptBreeder
evaluations=30 prompt...
2026.05
51.6
APE
evaluations=30 prompt...
2026.05
51.4
Feedback
Search any
task
Search any
task