Share your thoughts, 1 month free Claude Pro on usSee more

Reasoning on BigBench Hard Penguins

44.1Accuracy

ReElicit

Updated 2mo ago

Evaluation Results

Method	Links
ReElicit 2026.05		44.1
OPRO 2026.05		43.9
APE 2026.05		43.4
TextGrad 2026.05		33.1
PromptBreeder 2026.05		29.3