Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-step Reasoning on WordSort
Loading...
100
Accuracy
BoT
96.256
97.228
98.2
99.172
Jun 1, 2026
Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
BoT
Models=GPT-4
2026.06
100
eMoT
Models=Qwen-32B
2026.06
96.8
ToT
Models=GPT-4
2026.06
96.4
Feedback
Search any
task
Search any
task