OralGPT-Omni: A Versatile Dental Multimodal Large Language Model
About
Multimodal Large Language Models (MLLMs) have exhibited immense potential across numerous medical specialties; yet, dentistry remains underexplored, in part due to limited domain-specific data, scarce dental expert annotations, insufficient modality-specific modeling, and challenges in reliability. In this paper, we present OralGPT-Omni, the first dental-specialized MLLM designed for comprehensive and trustworthy analysis across diverse dental imaging modalities and clinical tasks. To explicitly capture dentists' diagnostic reasoning, we construct TRACE-CoT, a clinically grounded chain-of-thought dataset that mirrors dental radiologists' decision-making processes. This reasoning supervision, combined with our proposed four-stage training paradigm, substantially strengthens the model's capacity for dental image understanding and analysis. In parallel, we introduce MMOral-Uni, the first unified multimodal benchmark for dental image analysis. It comprises 2,809 open-ended question-answer pairs spanning five modalities and five tasks, offering a comprehensive evaluation suite to date for MLLMs in digital dentistry. OralGPT-Omni achieves an overall score of 51.84 on the MMOral-Uni benchmark and 45.31 on the MMOral-OPG benchmark, dramatically outperforming the scores of GPT-5. Our work promotes intelligent dentistry and paves the way for future advances in dental image analysis. All code, benchmark, and models will be made publicly available.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Open-ended VQA | MMOral-OPG | Teeth Accuracy53.5 | 55 | |
| Multimodal Dental Question Answering | MMOral-Uni | II-Loc66.8 | 32 | |
| Multimodal Dental Image Analysis | MMOral-Uni 1.0 (test) | Loc Score66.8 | 28 | |
| Dental Panoramic X-ray Interpretation | OPG-Bench | Overall Score15.7 | 10 | |
| Visual Question Answering | OPG-Bench VQA | Accuracy36.1 | 5 |