Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task-free exploration on AMZ
Loading...
70.5
SR (%)
WA → Target → AMZ
46.372
52.636
58.9
65.164
Oct 17, 2025
SR (%)
Skill Usage (%)
Updated 1mo ago
Evaluation Results
Method
Method
Links
SR (%)
Skill Usage (%)
WA → Target → AMZ
Training Setting=Seque...
2025.10
70.5
43.2
AMZ
Training Setting=Singl...
2025.10
69.5
48.3
AMZ + Target + WA
Training Setting=Self-...
2025.10
66.7
36.4
Target → AMZ → WA
Training Setting=Seque...
2025.10
66.1
40.8
AMZ → WA
Training Setting=Seque...
2025.10
65.3
42.7
AMZ → Target → WA
Training Setting=Seque...
2025.10
65.2
43.3
SkillWeaver*
Training Setting=Seque...
2025.10
64.4
25.2
WA
Training Setting=Singl...
2025.10
50.2
3.3
Target
Training Setting=Singl...
2025.10
48.5
3.5
Baseline
Iterations=–
2025.10
47.3
-
Feedback
Search any
task
Search any
task