| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Truthfulness and Calibration Evaluation | Cross-LVLM Pooled Average (GQA, POPE, etc.) | ECE7.1 | 8 | |
| Multimodal Understanding | Cross-LVLM (Aggregate of GQA, GMAI-MMBench, POPE, MME-Finance, MMMU_Pro, LLaVA-Wild) (test) | ECE13.8 | 8 |