Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Reasoning on BigBench Hard Boolean Expressions
Loading...
76.8
Accuracy
ReElicit
37.904
48.002
58.1
68.198
May 18, 2026
May 19, 2026
May 20, 2026
May 22, 2026
May 23, 2026
May 24, 2026
May 26, 2026
Accuracy
Updated 7d ago
Evaluation Results
Method
Method
Links
Accuracy
ReElicit
evaluations=30 prompt...
2026.05
76.8
DART
Base Model=Qwen3-0.6B
2026.05
73.9
GRPO
Base Model=Qwen3-0.6B
2026.05
73.7
STaR
Base Model=Qwen3-0.6B
2026.05
73
TESSY
Base Model=Qwen3-0.6B
2026.05
72.8
Base
Base Model=Qwen3-0.6B
2026.05
72.6
APE
evaluations=30 prompt...
2026.05
71.9
TextGrad
evaluations=30 prompt...
2026.05
66.9
OPRO
evaluations=30 prompt...
2026.05
65
PromptBreeder
evaluations=30 prompt...
2026.05
64.3
Original-SFT
Base Model=Qwen3-0.6B
2026.05
63
Original-SFT
Base Model=Qwen2.5-0.5...
2026.05
56.2
GRPO
Base Model=Qwen2.5-0.5...
2026.05
50.3
DART
Base Model=Qwen2.5-0.5...
2026.05
50.1
STaR
Base Model=Qwen2.5-0.5...
2026.05
47.1
Base
Base Model=Qwen2.5-0.5...
2026.05
46.6
TESSY
Base Model=Qwen2.5-0.5...
2026.05
39.4
Feedback
Search any
task
Search any
task