Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
E-commerce Agent Interaction on WebShop
Loading...
70.4
Average Reward
MCTS
32.4712
42.3181
52.165
62.0119
Jul 25, 2025
Jul 26, 2025
Jul 28, 2025
Jul 30, 2025
Aug 1, 2025
Aug 3, 2025
Aug 5, 2025
Average Reward
Updated 15d ago
Evaluation Results
Method
Method
Links
Average Reward
MCTS
Base LLM=Llama2-70B
2025.07
70.4
BPO
Approach=System-2, Eva...
2025.08
67.45
Tree DPO
Base LLM=Llama2-70B
2025.07
65.9
SFT
Base LLM=Llama2-70B
2025.07
65.1
MPO
Approach=System-2, Eva...
2025.08
55.2
Qwen-2.5-7B-Instruct
Approach=System-1, Eva...
2025.08
54.28
SFT
Approach=System-2, Eva...
2025.08
52.94
ETO
Approach=System-2, Eva...
2025.08
52.08
o3-mini
Approach=System-2, Eva...
2025.08
49.02
Qwen-3-Thinking
Approach=System-2, Eva...
2025.08
42.16
Deepseek-R1
Approach=System-2, Eva...
2025.08
40.31
Llama-3.1-8B-Instruct
Approach=System-1, Eva...
2025.08
33.93
Feedback
Search any
task
Search any
task