| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text-to-Image Generation | HRS | Count F166 | 10 | |
| Grounding Accuracy | HRS | Spatial Accuracy45.01 | 8 | |
| Grounding | HRS-Spatial | mIoU0.372 | 8 | |
| Prompt Fidelity | HRS dataset | CLIP Score33.63 | 6 | |
| Text-to-Image Generation | HRS benchmark | CLIP Score33.63 | 2 |