Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Chat on WildBench 2025 (test)
Loading...
1,062.4
WB-Elo
SR-GRPO
907.544
947.747
987.95
1,028.153
Dec 2, 2025
WB-Elo
Updated 4d ago
Evaluation Results
Method
Method
Links
WB-Elo
SR-GRPO
Model=Qwen2.5-1.5B-Ins...
2025.12
1,062.4
RM
Model=Qwen2.5-1.5B-Ins...
2025.12
1,043.3
Self-Reward
Model=Qwen2.5-1.5B-Ins...
2025.12
1,041.2
Perplexity
Model=Qwen2.5-1.5B-Ins...
2025.12
1,040.9
IPO
Model=Qwen2.5-1.5B-Ins...
2025.12
1,037.7
Base
Model=Qwen2.5-1.5B-Ins...
2025.12
1,036.2
SR-GRPO
Model=DeepSeek-R1-Dist...
2025.12
932.5
IPO
Model=DeepSeek-R1-Dist...
2025.12
922.4
Self-Reward
Model=DeepSeek-R1-Dist...
2025.12
919.2
RM
Model=DeepSeek-R1-Dist...
2025.12
918.4
Perplexity
Model=DeepSeek-R1-Dist...
2025.12
917
Base
Model=DeepSeek-R1-Dist...
2025.12
913.5
Feedback
Search any
task
Search any
task