| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Region Captioning | MedVidBench | RCllm3.442 | 5 | |
| Video Summarization | MedVidBench | VSllm Score4.184 | 5 | |
| Dense Video Captioning | MedVidBench | DVCllm3.797 | 5 | |
| Temporal Action Grounding | MedVidBench | TAGmIoU@0.321.6 | 5 | |
| Surgical Temporal Grounding | MedVidBench | STGmIoU0.202 | 5 | |
| Surgical Action | MedVidBench | SA Accuracy24.4 | 5 | |
| Next Action Prediction | MedVidBench | NAP Accuracy44.2 | 5 | |
| Critical View of Safety | MedVidBench | CVS Accuracy91.4 | 5 |