A Synthetic Data-Driven Radiology Foundation Model for Pan-tumor Clinical Diagnosis
About
AI-assisted imaging made substantial advances in tumor diagnosis and management. However, a major barrier to developing robust oncology foundation models is the scarcity of large-scale, high-quality annotated datasets, which are limited by privacy restrictions and the high cost of manual labeling. To address this gap, we present PASTA, a pan-tumor radiology foundation model built on PASTA-Gen, a synthetic data framework that generated 30,000 3D CT scans with pixel-level lesion masks and structured reports of tumors across ten organ systems. Leveraging this resource, PASTA achieves state-of-the-art performance on 45 of 46 oncology tasks, including non-contrast CT tumor screening, lesion segmentation, structured reporting, tumor staging, survival prediction, and MRI-modality transfer. To assess clinical applicability, we developed PASTA-AID, a clinical decision support system, and ran a retrospective simulated clinical trial across two scenarios. For pan-tumor screening on plain CT with fixed reading time, PASTA-AID increased radiologists' throughput by 11.1-25.1% and improved sensitivity by 17.0-31.4% and precision by 10.5-24.9%; additionally, in a diagnosis-aid workflow, it reduced segmentation time by up to 78.2% and reporting time by up to 36.5%. Beyond gains in accuracy and efficiency, PASTA-AID narrowed the expertise gap, enabling less-experienced radiologists to approach expert-level performance. Together, this work establishes an end-to-end, synthetic data-driven pipeline spanning data generation, model development, and clinical validation, thereby demonstrating substantial potential for pan-tumor research and clinical translation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Segmentation | Lung tumour | DSC70.9 | 30 | |
| Segmentation | Liver tumour | DSC69.6 | 30 | |
| Segmentation | Gallbladder cancer | DSC64.9 | 15 | |
| Pan-cancer Segmentation | Internal datasets | Lung Tumor DSC52.1 | 14 | |
| Pan-cancer Segmentation | Healthy Datasets CHAOS, TCIA, Atlas | CHAOS Score45 | 10 | |
| Pan-cancer Segmentation | Corona COVID-19 (External) | DSC58.1 | 10 | |
| Pan-cancer Segmentation | IRCADb liver tumors (External) | DSC0.527 | 10 | |
| Pan-cancer Segmentation | External Datasets Rider, Corona, IRCADb Average | Average DSC (%)45 | 10 | |
| Pan-cancer Screening | FLARE 2023 | DSC34.6 | 10 | |
| Pan-cancer Segmentation | Rider lung tumors (External) | DSC (%)24.2 | 10 |