Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Sequential Decision Making on WebShop
Loading...
0.88
Score
SYMPHONY-L
0.5264
0.6182
0.71
0.8018
Jan 30, 2026
Score
Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
Success Rate
SYMPHONY-L
Model Size=Stronger
2026.01
0.88
72
Human Expert
LLM=GPT-4
2026.01
0.82
60
SYMPHONY-S
Model Size=Lightweight
2026.01
0.82
56
MASTER
LLM=GPT-4
2026.01
0.8
-
LATS
LLM=GPT-4
2026.01
0.76
38
AgentKit
LLM=GPT-4
2026.01
0.7
-
Fine-tuning
LLM=GPT-4
2026.01
0.68
45
Reflexion
LLM=GPT-4
2026.01
0.64
35
IL+RL
LLM=GPT-4
2026.01
0.62
29
IL
LLM=GPT-4
2026.01
0.6
29
ReAct
LLM=GPT-4
2026.01
0.54
32
Feedback
Search any
task
Search any
task