Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WildBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Creative WritingWildBench
WildBench Score83.9
45
Instruction FollowingWildBench (test)
Info Seek58.6
27
Open-ended generationWildBench
WildBench0.479
26
Subjective EvaluationWildBench
Score0.8604
19
General Instruction FollowingWildBench
Score92.6
19
Instruction FollowingWildBench
WB Score63.18
18
Open-ended GenerationWildBench (test)
WildBench Score64.4
17
Creative WritingWildBench (test)
WildBench Score64.4
15
Real-world Query EvaluationWildBench
WildBench Accuracy71.5
14
General ChatWildBench
LLM Judge Score68.16
12
General chatWildBench 2025 (test)
WB-Elo1,062.4
12
Open-ended reasoningWildBench
Creative Score57.05
5
Open-ended text generationWildBench
Score-1.7
4
General Language Model EvaluationWildBench
WildBench Score26.95
2
Showing 14 of 14 rows