Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Arena

Benchmarks

Task NameDataset NameSOTA ResultTrend
Instruction FollowingArena Hard
Win Rate94.9
77
Conversational versatilityArena-Hard
Win Rate61.16
20
Open-ended GenerationArena-Hard
Score84.6
14
Technical problem-solvingArena Hard
Win Rate52.3
10
AlignmentArena-Hard
Score48.1
5
AlignmentArena-Hard
Hard Prompt Gemini Score70.4
4
Human Preference EvaluationArena Creative Writing
Win Rate23.4
3
Showing 7 of 7 rows