Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AIR-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Sound FoundationAIR-Bench 1.0 (test)
Score65.1
13
SafetyAIR-Bench
Average Score0.66
12
Paralinguistic speech understandingAIR-Bench Speech (test)
Emotion Acc71.45
11
Chat BenchmarkAIR-Bench
Score (Speech Domain)7.54
11
RetrievalAIR-Bench English 24.04
Wiki Score65.5
10
Question AnsweringAIR-Bench Foundation
Accuracy36.8
8
Content ModerationAIR-Bench Text + Image (test)
Precision83
8
Content ModerationAIR-Bench Image Only (test)
Precision94
8
Content ModerationAIR-Bench Text Only (test)
Precision94
8
Music Foundation TasksAIR-Bench Music 1.0 (test)
Inst. Classification Acc65.8
7
Speech FoundationAIR-Bench Speech Foundation
Speech Grounding5,920
7
Speech ChatAIR-Bench 1.0 (test)
Overall Score7.18
7
Gender ClassificationAir-Bench
Accuracy0.905
6
Open-Ended Audio UnderstandingAIR-Bench chat
AIR-Bench Chat Score6.8
3
Showing 14 of 14 rows