Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning Efficiency on ScienceWorld (Seen)
Loading...
83.16
Success Rate (%)
BPO
55.6416
62.7858
69.93
77.0742
Aug 5, 2025
Success Rate (%)
Number of Tokens
Updated 15d ago
Evaluation Results
Method
Method
Links
Success Rate (%)
Number of Tokens
BPO
Size=8B
2025.08
83.16
112
Qwen-3-Thinking
Size=8B
2025.08
57.45
763
Deepseek-R1
Size=671B
2025.08
56.7
620
Feedback
Search any
task
Search any
task