| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Spatial Reasoning | Our-Bench SpatialTree-Bench 1.0 (test) | Average Score57.8 | 16 | |
| Spoken Question Answering | Our Bench | Accuracy76.34 | 8 | |
| Compositional Generation | Our Bench | CLIP Score32.33 | 6 | |
| Layout-based generation | Our Bench Layout only | F1 Score44 | 5 | |
| Layout-based generation | Our Bench Layout + Reference | F1 Score35 | 4 |