Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Linguistic Reasoning on BigBench Hard Hyperbaton
Loading...
80.2
Accuracy
PromptBreeder
76.872
77.736
78.6
79.464
May 18, 2026
Accuracy
Updated 14d ago
Evaluation Results
Method
Method
Links
Accuracy
PromptBreeder
evaluations=30 prompt...
2026.05
80.2
ReElicit
evaluations=30 prompt...
2026.05
79.6
TextGrad
evaluations=30 prompt...
2026.05
79.1
OPRO
evaluations=30 prompt...
2026.05
78.3
APE
evaluations=30 prompt...
2026.05
77
Feedback
Search any
task
Search any
task