Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CrossAlpaca-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Open-ended Question AnsweringCrossAlpaca-Eval en 2.0
GPT-4o Score8.58
8
Open-ended Question AnsweringCrossAlpaca-Eval zh 2.0
GPT-4o Score8.53
4
Showing 2 of 2 rows