Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Helpfulness Evaluation on MTBench
Loading...
9.35
Helpfulness
GPT-4o
2.122
3.9985
5.875
7.7515
Feb 23, 2025
Feb 24, 2025
Feb 25, 2025
Feb 26, 2025
Feb 27, 2025
Feb 28, 2025
Helpfulness
Updated 15d ago
Evaluation Results
Method
Method
Links
Helpfulness
GPT-4o
NBF steering=false
2025.02
9.35
o1
NBF steering=false
2025.02
9.22
Claude 3.5 Sonnet
NBF steering=false
2025.02
9.14
o1
NBF steering=true
2025.02
8.83
GPT-4o
NBF steering=true
2025.02
8.77
Claude 3.5 Sonnet
NBF steering=true
2025.02
8.61
GPT-3.5-turbo
NBF steering=false
2025.02
8
GPT-3.5-turbo
NBF steering=true
2025.02
7.59
Llama3.1-8B-Instruct
FL method=FedAvg
2025.02
6.8
Llama3.1-8B-Instruct
FL method=SCAFFOLD
2025.02
6.8
FL + Safety filter + CAI
FL method=FedAvg
2025.02
6.1
FL + CAI
FL method=SCAFFOLD
2025.02
5.9
FL + CAI
FL method=FedAvg
2025.02
5.8
FL + Safety filter + CAI
FL method=SCAFFOLD
2025.02
5.8
FL
FL method=SCAFFOLD
2025.02
2.9
FL
FL method=FedAvg
2025.02
2.7
FL + Safety filter
FL method=SCAFFOLD
2025.02
2.7
FL + Safety filter
FL method=FedAvg
2025.02
2.4
Feedback
Search any
task
Search any
task