Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tabular Reasoning on BBH Penguins in a Table
Loading...
93
Accuracy
Zero-shot (Default Imp.)
61.8
69.9
78
86.1
May 29, 2026
Accuracy
Cost (USD per 100 examples)
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
Cost (USD per 100 examples)
Zero-shot (Default Imp.)
Workflow Type=Zero-sho...
2026.05
93
0.05
Dynamic Workflow (ReAct)
Workflow Type=Dynamic,...
2026.05
72
0.54
Static Workflow
Workflow Type=Static,...
2026.05
63
0.21
Feedback
Search any
task
Search any
task