Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-step Reasoning on Checkmate
Loading...
90
Accuracy
eMoT
47.568
58.584
69.6
80.616
Jun 1, 2026
Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
eMoT
Models=Qwen-32B
2026.06
90
BoT
Models=GPT-4
2026.06
86.4
ToT
Models=GPT-4
2026.06
49.2
Feedback
Search any
task
Search any
task