Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Personalized Interaction on AI2 ARC Synthetic
Loading...
7.38
Personalization Score
PPOpt
4.5096
5.2548
6
6.7452
Feb 12, 2026
Personalization Score
Task Completion Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Personalization Score
Task Completion Score
PPOpt
Backbone=Llama-3-8b-in...
2026.02
7.38
9.5
PPOpt
Backbone=GPT-oss-20b
2026.02
7.34
9.48
PPOpt
Backbone=Qwen3-8b
2026.02
6.6
9.3
Vanilla
Backbone=Llama-3-8b-in...
2026.02
4.62
9.56
Vanilla
Backbone=Qwen3-8b
2026.02
4.62
9.56
Vanilla
Backbone=GPT-oss-20b
2026.02
4.62
9.56
Feedback
Search any
task
Search any
task