Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-step Reasoning on Game24
Loading...
100
Accuracy
eMoT
72.96
79.98
87
94.02
Jun 1, 2026
Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
eMoT
Models=Qwen-32B
2026.06
100
BoT
Models=GPT-4
2026.06
82.4
ToT
Models=GPT-4
2026.06
74
Feedback
Search any
task
Search any
task