Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings

About

Traditional open-access datasets focusing on surgical procedures are often limited by their small size, typically consisting of fewer than 100 videos and less than 30 hours of footage, which leads to poor model generalization. To address this data limitation, a new dataset called LEMON has been compiled using a novel aggregation pipeline that collects high-resolution videos from online sources. Featuring an extensive collection of over 4K surgical videos totaling 938 hours (85 million frames) of high-quality footage across multiple procedure types, LEMON offers a comprehensive resource surpassing existing alternatives in size and scope, including two novel downstream tasks. To demonstrate the effectiveness of this diverse dataset, we introduce LemonFM, a foundation model pretrained on LEMON using a novel self-supervised augmented knowledge distillation approach. LemonFM consistently outperforms existing surgical foundation models across four downstream tasks and six datasets, achieving significant gains in surgical phase recognition (+9.5pp, +9.4pp, and +8.4pp in Jaccard on AutoLaparo, M2CAI16, and Cholec80), surgical action recognition (+4.4pp in mAP on CholecT50), surgical tool presence detection (+5.3pp and +10.2pp in mAP on Cholec80 and GraSP), and surgical semantic segmentation (+10.3pp in mDice on CholecSeg8k). LEMON and LemonFM will serve as foundational resources for the research community and industry, accelerating progress in developing autonomous robotic surgery systems and ultimately contributing to safer and more accessible surgical care worldwide. Dataset, code, and models are publicly available at https://github.com/visurg-ai/LEMON.

Chengan Che, Chao Wang, Tom Vercauteren, Sophia Tsoka, Luis C. Garcia-Peraza-Herrera• 2025

Related benchmarks

TaskDatasetResultRank
Surgical Phase RecognitionCholec80
Top-1 Accuracy92.7
65
Surgical workflow recognitionM2CAI 2016
Accuracy68.4
39
Surgical Phase RecognitionAutolaparo
Average F166.9
36
Semantic segmentationCholecSeg8K (test)
Dice Score81.3
20
Surgical action recognitionCholecT50
mAP61.9
15
Surgical tool presence detectionCholec80
mAP93.7
15
Instrument Presence RecognitionGrasp
mAP94.4
14
Surgical Phase RecognitionM2CAI16 (test)
Accuracy89.9
10
Surgical tool presence detectionGrasp
mAP76.4
7
Binary video classification of surgery typesLEMON
Accuracy98.9
5
Showing 10 of 11 rows

Other info

Follow for update