| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Action-relation hallucination evaluation | R-Bench Instance | Accuracy75.86 | 25 | |
| Action-relation hallucination evaluation | R-Bench Image | Accuracy83.5 | 25 | |
| Multimodal Relation Reasoning | R-Bench | Accuracy85.05 | 20 | |
| Real-world Understanding | R-Bench | Distance Error55.5 | 19 | |
| Multimodal Hallucination Evaluation | R-Bench | Dis66.68 | 13 | |
| Robustness | R-Bench | R-Bench Dis Metric61.01 | 13 | |
| Spatial-relation hallucination detection | R-Bench Instance | Accuracy77.39 | 8 | |
| Spatial-relation hallucination detection | R-Bench Image | Accuracy81.13 | 8 | |
| Visual Understanding | R-Bench (test) | MCQ (low)65.29 | 8 | |
| Visual Reasoning | R-Bench-V Game | Accuracy20.7 | 5 | |
| Visual Reasoning | R-Bench-V Physics | Accuracy71.3 | 5 | |
| Relational Hallucination Evaluation | R-Bench | F1 Score79.1 | 5 | |
| Scientific Reasoning | R-Bench | pass@1 Score61.68 | 2 |