Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Open-Ended Dialogue (out-of-distribution)
Loading...
68.3
MT-Bench
CausalRM
66.116
66.683
67.25
67.817
Jan 29, 2026
MT-Bench
PKU-SafeRLHF
SHP
TruthfulQA
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
MT-Bench
PKU-SafeRLHF
SHP
TruthfulQA
Average Score
CausalRM
2026.01
68.3
60.9
53.9
66.2
62.3
Standard RM
2026.01
68.2
57.8
54.2
58.5
59.7
InfoRM
2026.01
66.7
60.1
50.4
61.9
59.8
GoalRM
2026.01
66.2
59.3
53.8
63.6
60.7
Feedback
Search any
task
Search any
task