Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Reasoning on Boolean BBEH
Loading...
25.5
Accuracy
CoT
1.06
7.405
13.75
20.095
Mar 14, 2026
Accuracy
Operational Cost
Updated 26d ago
Evaluation Results
Method
Method
Links
Accuracy
Operational Cost
CoT
Model=Gemma3
2026.03
25.5
6,010
CoT
Model=Qwen3
2026.03
24
4,977
CoT
Model=Llama3.1
2026.03
18
5,055
ToT
Model=Llama3.1
2026.03
6
12,450
DPTS
Model=Llama3.1
2026.03
5
9,452
DST
Model=Llama3.1
2026.03
4.5
5,752
ToT
Model=Gemma3
2026.03
4.5
13,817
ToT
Model=Qwen3
2026.03
4
11,751
DST
Model=Gemma3
2026.03
3.5
6,874
DPTS
Model=Qwen3
2026.03
3
9,955
DST
Model=Qwen3
2026.03
3
5,953
DPTS
Model=Gemma3
2026.03
2
11,501
Feedback
Search any
task
Search any
task