Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AIR-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Sound FoundationAIR-Bench 1.0 (test)
Score65.1
13
SafetyAIR-Bench
Average Score0.66
12
Paralinguistic speech understandingAIR-Bench Speech (test)
Emotion Acc71.45
11
Chat BenchmarkAIR-Bench
Score (Speech Domain)7.54
11
Speech UnderstandingAIR-Bench
SER29.9
10
RetrievalAIR-Bench English 24.04
Wiki Score65.5
10
Open-formed Audio Question AnsweringAIR-Bench Music
Score6.16
8
Audio ClassificationAIR-Bench Speech
Emotion Acc (MELD)47.16
8
Open-formed Audio Question AnsweringAIR-Bench Sound
Score7.01
8
Question AnsweringAIR-Bench Foundation
Accuracy36.8
8
Content ModerationAIR-Bench Text + Image (test)
Precision83
8
Content ModerationAIR-Bench Image Only (test)
Precision94
8
Content ModerationAIR-Bench Text Only (test)
Precision94
8
Music Foundation TasksAIR-Bench Music 1.0 (test)
Inst. Classification Acc65.8
7
Speech FoundationAIR-Bench Speech Foundation
Speech Grounding5,920
7
Speech ChatAIR-Bench 1.0 (test)
Overall Score7.18
7
Gender ClassificationAir-Bench
Accuracy0.905
6
Open-Ended Audio UnderstandingAIR-Bench chat
AIR-Bench Chat Score6.8
3
Showing 18 of 18 rows