Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AirBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
General UtilityAirBench 1-10
Speech Score8.1
12
Toxicity and Harmful Content DetectionAirbench
Score98.59
5
Multiple-choice Question AnsweringAirBench Foundational
Total Average Score44
4
Jailbreak Attack EvaluationAirBench
Attack Success Count1,390
2
Showing 4 of 4 rows