Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
User behavior simulation on Clothings
Loading...
72.47
Precision
STEAM
18.494
32.507
46.52
60.533
Jan 23, 2026
Precision
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Precision
F1 Score
STEAM
Ratio (1:k)=1:1
2026.01
72.47
64.86
AFL
Ratio (1:k)=1:1
2026.01
71.6
62.21
GPT-3.5-Turbo
Ratio (1:k)=1:1
2026.01
70.08
52.4
STEAM
Ratio (1:k)=1:3
2026.01
50
59.18
AFL
Ratio (1:k)=1:3
2026.01
48.2
57.74
GPT-3.5-Turbo
Ratio (1:k)=1:3
2026.01
43.79
54.31
STEAM
Ratio (1:k)=1:9
2026.01
33.33
45.47
AFL
Ratio (1:k)=1:9
2026.01
20.95
32.16
GPT-3.5-Turbo
Ratio (1:k)=1:9
2026.01
20.57
31.32
Feedback
Search any
task
Search any
task