Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Linguistic Reasoning on BigBench Hard Snarks
Loading...
0.554
Accuracy
TextGrad
0.52592
0.53321
0.5405
0.54779
May 18, 2026
Accuracy
Updated 14d ago
Evaluation Results
Method
Method
Links
Accuracy
TextGrad
evaluations=30 prompt...
2026.05
0.554
ReElicit
evaluations=30 prompt...
2026.05
0.551
PromptBreeder
evaluations=30 prompt...
2026.05
0.539
OPRO
evaluations=30 prompt...
2026.05
0.537
APE
evaluations=30 prompt...
2026.05
0.527
Feedback
Search any
task
Search any
task