Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Interactive Decision Making on ScienceWorld (OOD)
Loading...
9.9
Score
MTRouter
-28.372
-18.436
-8.5
1.436
Apr 26, 2026
Score
Total Cost
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
Total Cost
MTRouter
Routing Level=Multi-Tu...
2026.04
9.9
16.3
RouterDC
Routing Level=Single-T...
2026.04
5.5
2.5
EmbedLLM
Routing Level=Single-T...
2026.04
5
3
GPT-5
Routing Level=Single-M...
2026.04
4.9
47.6
AvengersPro
Routing Level=Single-T...
2026.04
2.4
4.1
Router-R1
Routing Level=Multi-Tu...
2026.04
2.1
21
GPT-OSS-120B
Routing Level=Single-M...
2026.04
1.1
4.2
MiniMax-M2
Routing Level=Single-M...
2026.04
0.9
10.9
Kimi-K2
Routing Level=Single-M...
2026.04
0.2
8.9
LLM Router
Routing Level=Multi-Tu...
2026.04
-0.4
28.3
Gemini-2.5-Flash-Lite
Routing Level=Single-M...
2026.04
-2.1
1.5
DeepSeek-V3.2
Routing Level=Single-M...
2026.04
-4.2
22.8
Random Router
Routing Level=Multi-Tu...
2026.04
-8.1
20.3
OpenRouter
Routing Level=Multi-Tu...
2026.04
-26.9
15.5
Feedback
Search any
task
Search any
task