Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Task Completion on Synthetic personalized interaction datasets (evaluation)
Loading...
8.48
Task Completion Score
History-augmented prompting
7.3776
7.6638
7.95
8.2362
Feb 12, 2026
Task Completion Score
Absolute Delta
Updated 4d ago
Evaluation Results
Method
Method
Links
Task Completion Score
Absolute Delta
History-augmented prompting
Optimization Status=Va...
2026.02
8.48
-
Persona-based query rewriting
Optimization Status=Va...
2026.02
8.48
-
Preference-based few-shot ICL
Optimization Status=Va...
2026.02
8.48
-
Controller-guided prompting
Optimization Status=Va...
2026.02
8.48
-
PPOpt
Optimization Status=Va...
2026.02
8.48
-
PPOpt
Optimization Status=PPOpt
2026.02
8.26
0.22
Preference-based few-shot ICL
Optimization Status=PPOpt
2026.02
7.94
0.54
Controller-guided prompting
Optimization Status=PPOpt
2026.02
7.94
0.54
History-augmented prompting
Optimization Status=PPOpt
2026.02
7.89
0.59
Persona-based query rewriting
Optimization Status=PPOpt
2026.02
7.42
1.06
Feedback
Search any
task
Search any
task