Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Helpfulness Evaluation on MTBench

9.35Helpfulness

GPT-4o

7.51967.99488.478.9452Feb 28, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.02
9.35
2025.02
9.22
2025.02
9.14
2025.02
8.83
2025.02
8.77
2025.02
8.61
2025.02
8
2025.02
7.59