Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Interactive Decision Making on ScienceWorld (test)
Loading...
53.8
Score
MTRouter
-29.608
-7.954
13.7
35.354
Apr 26, 2026
Score
Total Cost ($)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
Total Cost ($)
MTRouter
Routing Level=Multi-Tu...
2026.04
53.8
5.7
GPT-5
Routing Level=Single-M...
2026.04
48.4
13.9
Router-R1
Routing Level=Multi-Tu...
2026.04
42.1
12.6
AvengersPro
Routing Level=Single-T...
2026.04
36.8
4.1
EmbedLLM
Routing Level=Single-T...
2026.04
30.4
5.6
GPT-OSS-120B
Routing Level=Single-M...
2026.04
26.6
0.5
RouterDC
Routing Level=Single-T...
2026.04
23.1
3.3
Random Router
Routing Level=Multi-Tu...
2026.04
21.7
3.9
LLM Router
Routing Level=Multi-Tu...
2026.04
19.8
12.2
DeepSeek-V3.2
Routing Level=Single-M...
2026.04
13.1
2.9
Kimi-K2
Routing Level=Single-M...
2026.04
5.2
2.5
Gemini-2.5-Flash-Lite
Routing Level=Single-M...
2026.04
4.2
0.3
MiniMax-M2
Routing Level=Single-M...
2026.04
-0.5
3.2
OpenRouter
Routing Level=Multi-Tu...
2026.04
-26.4
3
Feedback
Search any
task
Search any
task