Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Text-to-Spatial Audio Generation on Spatial Audio Caption (test)
Loading...
4.65
MOS (Spatial Quality)
Ground Truth
3.714
3.957
4.2
4.443
May 29, 2026
MOS (Spatial Quality)
FD
KL Divergence
MOS (Auditory Fidelity)
Updated 2d ago
Evaluation Results
Method
Method
Links
MOS (Spatial Quality)
FD
KL Divergence
MOS (Auditory Fidelity)
Ground Truth
2026.05
4.65
-
-
4.76
SwanSphere
Params=1.09B, Inf. Tim...
2026.05
4.31
142.8
1.43
4.43
OmniAudio(text)
Params=1.22B, Inf. Tim...
2026.05
4.11
174.13
1.83
4.16
Tango2+AS
Params=0.86B, Inf. Tim...
2026.05
3.95
235.71
2.42
3.27
AudioLDM-2+AS
Params=0.71B, Inf. Tim...
2026.05
3.86
294.17
2.45
3.53
MMAudio+AS
Params=1.03B, Inf. Tim...
2026.05
3.75
313.26
2.77
3.44
Feedback
Search any
task
Search any
task