| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Large Multimodal Model Evaluation | MLLM-as-a-Judge v1.0 (test) | Overall Score49 | 16 | |
| Reward Modeling | MLLM-as-a-Judge (MaaJ) | Accuracy72.18 | 13 | |
| Pointwise Scoring | MLLM-as-a-Judge in-domain v1.0 (test) | ImageDC Score80.2 | 9 |