Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Streaming Speech Generation on Streaming generation scenarios 0.6s speech chunk
Loading...
368.67
First-chunk Latency (ms)
VocalNet-MDM
328.5576
599.3163
870.075
1,140.8337
Feb 9, 2026
First-chunk Latency (ms)
Updated 4d ago
Evaluation Results
Method
Method
Links
First-chunk Latency (ms)
VocalNet-MDM
Diffusion steps=1, Har...
2026.02
368.67
VocalNet-MDM
Diffusion steps=2, Har...
2026.02
373.29
VocalNet-MDM
Diffusion steps=4, Har...
2026.02
382.36
VocalNet-MDM
Diffusion steps=8, Har...
2026.02
402.19
VocalNet-MDM
Diffusion steps=16, Ha...
2026.02
427.45
VocalNet-8B
Hardware=single L20 GP...
2026.02
462.32
Baseline-AR
Training objective=MTP...
2026.02
481.22
VITA-Audio
Hardware=single L20 GP...
2026.02
512.64
Baseline-AR
Training objective=NTP...
2026.02
555.86
SLAM-Omni
Hardware=single L20 GP...
2026.02
742.32
GLM-4-Voice
Hardware=single L20 GP...
2026.02
1,066.02
MiniCPM-o
Hardware=single L20 GP...
2026.02
1,329.52
Kimi-Audio
Hardware=single L20 GP...
2026.02
1,371.48
Feedback
Search any
task
Search any
task