Astra: a generalizable report generation foundation model for 3D computed tomography
About
CT interpretation requires radiologists to review hundreds of volumetric slices per examination, making reporting time-consuming and highly expertise-dependent. Automated CT report generation offers a promising route to improving clinical efficiency, yet the field still lacks a generalizable CT report generation foundation model that supports multi-region reporting and remains robust across external real-world cohorts. Intrinsic inconsistencies in reporting style and diagnostic terminology across cohorts make naive joint training prone to noisy textual supervision, thereby limiting model generalizability. Here we present Astra, a generalizable CT report generation foundation model trained on 90,678 thoracoabdominal CT-report pairs (CTRgDB) with 353,671 abnormalities spanning eight organ systems. By harmonizing report style and further refining diagnostic consistency via reinforcement learning, Astra achieves style-consistent and diagnostically accurate report generation across diverse anatomical regions and institutions. Evaluating on CTRgDB and six external cohorts, Astra achieves state-of-the-art performance with a 44.1% average improvement in fine-grained diagnostic metrics (P<0.001). In real-world clinical workflows, Astra assistance accelerates chest report drafting by 29.6% and improves abdominal report completeness by 11.3% (P<0.001). Furthermore, Astra also demonstrates broad utility as a foundation for CT AI development, improving downstream diagnostic performance and scaling vision-language pretrain through high-quality report synthesis. Overall, Astra serves as a broadly accessible clinical assistant and a pivotal infrastructure for the next generation of AI-powered healthcare.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Report Generation | CT-RATE | -- | 26 | |
| Classification | CT-RATE (test) | Micro Precision57.89 | 14 | |
| Fine-grained captioning | INSPECT | RaTE Score33.05 | 14 | |
| Medical image captioning | CT-RATE (test) | RaTE Score0.351 | 14 | |
| Medical Image Classification | BIMCV (test) | Micro Precision35.72 | 14 | |
| Medical Report Generation | BIMCV n=1,505 cases (test) | RaTE Score0.2624 | 14 | |
| Medical Report Generation | MERLIN (test) | RaTE Score35.64 | 14 | |
| Natural language generation | INSPECT | BLEU-10.4622 | 14 | |
| Natural language generation | BIMCV | BLEU-140.2 | 14 | |
| Natural language generation | Merlin | BLEU-10.3898 | 14 |