| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text-to-Audio Generation | evaluation benchmarks one-to-one | CLAP Score42.71 | 6 | |
| Text-to-Image Generation | evaluation benchmarks one-to-one | CLIP Score31.6 | 6 | |
| Audio-to-Image Generation | evaluation benchmarks one-to-one | AIS78.17 | 4 | |
| Image-to-Audio Generation | evaluation benchmarks one-to-one | AIS82.89 | 4 |