Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Linguistic Reasoning on BigBench Hard Snarks

0.554Accuracy

TextGrad

0.525920.533210.54050.54779May 18, 2026
Updated 14d ago

Evaluation Results

MethodLinks
2026.05
0.554
2026.05
0.551
2026.05
0.539
2026.05
0.537
2026.05
0.527