Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Score and Success Rate on Webshop
Loading...
82.1
Score
Human Expert
59.012
65.006
71
76.994
Oct 6, 2022
Score
Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
Success Rate
Human Expert
Type=Human baseline
2022.10
82.1
59.6
ReAct
Setting=one-shot promp...
2022.10
66.6
40
IL+RL
Training=10,587 traini...
2022.10
62.4
28.7
Act
Setting=one-shot promp...
2022.10
62.3
30.1
IL
Training=1,012 human a...
2022.10
59.9
29.1
Feedback
Search any
task
Search any
task