Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task-free exploration on Target
Loading...
77.3
SR (%)
AMZ → Target → WA
59.828
64.364
68.9
73.436
Oct 17, 2025
SR (%)
Skill Usage (%)
Updated 1mo ago
Evaluation Results
Method
Method
Links
SR (%)
Skill Usage (%)
AMZ → Target → WA
Training Setting=Seque...
2025.10
77.3
24.3
Target
Training Setting=Singl...
2025.10
77
52.1
WA → Target → AMZ
Training Setting=Seque...
2025.10
76.8
23.3
AMZ + Target + WA
Training Setting=Self-...
2025.10
75.2
19.4
SkillWeaver*
Training Setting=Seque...
2025.10
74.2
18.3
Target → AMZ → WA
Training Setting=Seque...
2025.10
69.2
18.9
AMZ → WA
Training Setting=Seque...
2025.10
62.5
3.1
AMZ
Training Setting=Singl...
2025.10
61.5
3
WA
Training Setting=Singl...
2025.10
61.2
2.8
Baseline
Iterations=–
2025.10
60.5
-
Feedback
Search any
task
Search any
task