Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Reasoning on BoardgameQA BBEH
Loading...
0.34
Accuracy
CoT
0.0228
0.10515
0.1875
0.26985
Mar 14, 2026
Accuracy
Cost
Updated 26d ago
Evaluation Results
Method
Method
Links
Accuracy
Cost
CoT
Model=Qwen3
2026.03
0.34
4,425
CoT
Model=Gemma3
2026.03
0.33
5,311
CoT
Model=Llama3.1
2026.03
0.315
4,501
DST
Model=Llama3.1
2026.03
0.14
4,307
DST
Model=Qwen3
2026.03
0.12
4,560
ToT
Model=Llama3.1
2026.03
0.1
15,004
DPTS
Model=Llama3.1
2026.03
0.095
9,604
ToT
Model=Qwen3
2026.03
0.08
14,220
DPTS
Model=Qwen3
2026.03
0.08
9,953
ToT
Model=Gemma3
2026.03
0.06
16,924
DST
Model=Gemma3
2026.03
0.05
5,218
DPTS
Model=Gemma3
2026.03
0.035
11,701
Feedback
Search any
task
Search any
task