Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Helpfulness Evaluation on MTBench
Loading...
9.35
Helpfulness
GPT-4o
7.5196
7.9948
8.47
8.9452
Feb 28, 2025
Helpfulness
Updated 4d ago
Evaluation Results
Method
Method
Links
Helpfulness
GPT-4o
NBF steering=false
2025.02
9.35
o1
NBF steering=false
2025.02
9.22
Claude 3.5 Sonnet
NBF steering=false
2025.02
9.14
o1
NBF steering=true
2025.02
8.83
GPT-4o
NBF steering=true
2025.02
8.77
Claude 3.5 Sonnet
NBF steering=true
2025.02
8.61
GPT-3.5-turbo
NBF steering=false
2025.02
8
GPT-3.5-turbo
NBF steering=true
2025.02
7.59
Feedback
Search any
task
Search any
task