Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Can-SAVE: Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR

About

Conventional medical cancer screening methods are costly, labor-intensive, and extremely difficult to scale. Although AI can improve cancer detection, most systems rely on complex or specialized medical data, making them impractical for large-scale screening. We introduce Can-SAVE, a lightweight AI system that ranks population-wide cancer risks solely based on medical history events. By integrating survival model outputs into a gradient-boosting framework, our approach detects subtle, long-term patient risk patterns - often well before clinical symptoms manifest. Can-SAVE was rigorously evaluated on a real-world dataset of 2.5 million adults spanning five Russian regions, marking the study as one of the largest and most comprehensive deployments of AI-driven cancer risk assessment. In a retrospective oncologist-supervised study over 1.9M patients, Can-SAVE achieves a 4-10x higher detection rate at identical screening volumes and an Average Precision (AP) of 0.228 vs. 0.193 for the best baseline (LoRA-tuned Qwen3-Embeddings via DeepSeek-R1 summarization). In a year-long prospective pilot (426K patients), our method almost doubled the cancer detection rate (+91%) and increased population coverage by 36% over the national screening protocol. The system demonstrates practical scalability: a city-wide population of 1 million patients can be processed in under three hours using standard hardware, enabling seamless clinical integration. This work proves that Can-SAVE achieves nationally significant cancer detection improvements while adhering to real-world public healthcare constraints, offering immediate clinical utility and a replicable framework for population-wide screening. Code for training and feature engineering is available at https://github.com/sb-ai-lab/Can-SAVE.

Petr Philonenko, Vladimir Kokh, Pavel Blinov• 2023

Related benchmarks

TaskDatasetResultRank
Cancer risk predictionEHR-based cancer screening dataset 2016-2023 (test)
Average Precision22.8
18
Cancer DetectionBreast Cancer Prospective experiment
Cancers per 1000 Screenings1.7
2
Cancer DetectionLung Cancer Prospective experiment
Cancers per 1000 Screenings2.1
2
Cancer DetectionColorectal Cancer Prospective experiment
Cancers per 1000 Screenings4.3
2
Cancer screeningRussian Retrospective Cohort Region A 2020-2021
Detections41
2
Cancer screeningRussian Retrospective Cohort 2020-2021 (Region B)
Detections58
2
Cancer screeningRussian Retrospective Cohort 2021-2022 (Region C)
Detections71
2
Cancer screeningRussian Region D Retrospective Cohort 2016-2017
Detections84
2
Cancer screeningRussian Retrospective Cohort 2022-2023 (Region E)
Detections90
2
Cancer Detection12-month prospective pilot (426,210 patients)
Invited Patients3.21e+5
2
Showing 10 of 10 rows

Other info

Follow for update