| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Domain Reasoning | DRBench (test) | Score42.9 | 14 | |
| Large Vision-Language Model Evaluation | DRBench BS | MCQ Score29.68 | 14 | |
| Large Vision-Language Model Evaluation | DRBench S Subset | MCQ Accuracy47.22 | 14 | |
| Large Vision-Language Model Evaluation | DRBench B | MCQ Score27.04 | 14 | |
| Agentic Task | DRBench | Score43 | 10 | |
| Citation URL Validity Analysis | DRBench | Non-resolving Rate5.4 | 10 |