Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Chatbot Evaluation on ArenaHard v2
Loading...
14
Hard Prompt Accuracy
Base
11.088
11.844
12.6
13.356
Jan 28, 2026
Hard Prompt Accuracy
Creative Writing Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Hard Prompt Accuracy
Creative Writing Accuracy
Base
2026.01
14
13.7
SDPO
2026.01
12.3
11.1
GRPO
2026.01
12
10.8
SFT on self-teacher
2026.01
11.2
8.9
Feedback
Search any
task
Search any
task