Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Language Model Evaluation on WildBench
Loading...
26.95
WildBench Score
PUGC
25.5772
25.9336
26.29
26.6464
Jun 4, 2025
WildBench Score
Creative Tasks Score
Planning & Reasoning Score
Math & Data Analysis Score
Information Seeking Score
Coding & Debugging Score
Updated 4d ago
Evaluation Results
Method
Method
Links
WildBench Score
Creative Tasks Score
Planning & Reasoning Score
Math & Data Analysis Score
Information Seeking Score
Coding & Debugging Score
PUGC
Model=PUGC+DPO, Alignm...
2025.06
26.95
46.56
33.36
11.43
40.2
17.16
Mistral-7B-instruct
Model=Mistral-7B-instruct
2025.06
25.63
42.07
30.06
10.08
40.1
18.4
Feedback
Search any
task
Search any
task