Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Helpfulness Evaluation on MTBench

9.35Helpfulness

GPT-4o

2.1223.99855.8757.7515Feb 23, 2025Feb 24, 2025Feb 25, 2025Feb 26, 2025Feb 27, 2025Feb 28, 2025
Updated 15d ago

Evaluation Results

MethodLinks
2025.02
9.35
2025.02
9.22
2025.02
9.14
2025.02
8.83
2025.02
8.77
2025.02
8.61
2025.02
8
2025.02
7.59
2025.02
6.8
2025.02
6.8
2025.02
6.1
2025.02
5.9
2025.02
5.8
2025.02
5.8
2025.02
2.9
2025.02
2.7
2025.02
2.7
2025.02
2.4