Can-SAVE: Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR

About

Conventional medical cancer screening methods are costly, labor-intensive, and extremely difficult to scale. Although AI can improve cancer detection, most systems rely on complex or specialized medical data, making them impractical for large-scale screening. We introduce Can-SAVE, a lightweight AI system that ranks population-wide cancer risks solely based on medical history events. By integrating survival model outputs into a gradient-boosting framework, our approach detects subtle, long-term patient risk patterns - often well before clinical symptoms manifest. Can-SAVE was rigorously evaluated on a real-world dataset of 2.5 million adults spanning five Russian regions, marking the study as one of the largest and most comprehensive deployments of AI-driven cancer risk assessment. In a retrospective oncologist-supervised study over 1.9M patients, Can-SAVE achieves a 4-10x higher detection rate at identical screening volumes and an Average Precision (AP) of 0.228 vs. 0.193 for the best baseline (LoRA-tuned Qwen3-Embeddings via DeepSeek-R1 summarization). In a year-long prospective pilot (426K patients), our method almost doubled the cancer detection rate (+91%) and increased population coverage by 36% over the national screening protocol. The system demonstrates practical scalability: a city-wide population of 1 million patients can be processed in under three hours using standard hardware, enabling seamless clinical integration. This work proves that Can-SAVE achieves nationally significant cancer detection improvements while adhering to real-world public healthcare constraints, offering immediate clinical utility and a replicable framework for population-wide screening. Code for training and feature engineering is available at https://github.com/sb-ai-lab/Can-SAVE.

Petr Philonenko, Vladimir Kokh, Pavel Blinov• 2023

Related benchmarks

Task	Dataset	Result
Cancer risk prediction	EHR-based cancer screening dataset 2016-2023 (test)	Average Precision22.8	18
Cancer Detection	Breast Cancer Prospective experiment	Cancers per 1000 Screenings1.7	2
Cancer Detection	Lung Cancer Prospective experiment	Cancers per 1000 Screenings2.1	2
Cancer Detection	Colorectal Cancer Prospective experiment	Cancers per 1000 Screenings4.3	2
Cancer screening	Russian Retrospective Cohort Region A 2020-2021	Detections41	2
Cancer screening	Russian Retrospective Cohort 2020-2021 (Region B)	Detections58	2
Cancer screening	Russian Retrospective Cohort 2021-2022 (Region C)	Detections71	2
Cancer screening	Russian Region D Retrospective Cohort 2016-2017	Detections84	2
Cancer screening	Russian Retrospective Cohort 2022-2023 (Region E)	Detections90	2
Cancer Detection	12-month prospective pilot (426,210 patients)	Invited Patients3.21e+5	2

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord