Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Reasoning on BOARD (BOARDGAMEQA)
Loading...
96.3
Accuracy
Gemini 2.5 Pro+ASP
41.908
56.029
70.15
84.271
May 16, 2023
Nov 12, 2023
May 10, 2024
Nov 6, 2024
May 5, 2025
Nov 1, 2025
Apr 30, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Gemini 2.5 Pro+ASP
Inference Pipeline=LLM...
2026.04
96.3
Gemini 2.5 Flash+ASP
Inference Pipeline=LLM...
2026.04
94.7
Gemini 2.5 Flash
Inference Pipeline=Bas...
2026.04
93
DS-R1-0528+ASP
Inference Pipeline=LLM...
2026.04
91.7
Gemini 2.5 Pro
Inference Pipeline=Bas...
2026.04
89.8
o4-mini
Inference Pipeline=Bas...
2026.04
88.2
o4-mini+ASP
Inference Pipeline=LLM...
2026.04
85.2
DS-R1-0528
Inference Pipeline=Bas...
2026.04
81
SATLM
Language Model=code-da...
2023.05
80.7
SATLM
Language Model=code-da...
2023.05
79.4
DS-V3
Inference Pipeline=Bas...
2026.04
71.3
COT
Language Model=code-da...
2023.05
62.8
COT
Language Model=code-da...
2023.05
60.7
STANDARD
Language Model=code-da...
2023.05
44.6
DS-V3+ASP
Inference Pipeline=LLM...
2026.04
44
Feedback
Search any
task
Search any
task