SurgicalMamba: Dual-Path SSD with State Regramming for Online Surgical Phase Recognition
About
Online surgical phase recognition (SPR) underpins context-aware operating-room systems and requires committing to a prediction at every frame from past context alone. Surgical video poses three demands that natural-video recognizers do not jointly address: procedures span tens of thousands of frames, time flows non-uniformly as long routine stretches are punctuated by brief phase-defining transitions, and the visual domain is narrow so backbone features are strongly correlated across channels. Existing recognizers either let per-frame cost grow with elapsed length, or hold cost bounded but advance state at a uniform rate with channel-independent dynamics, leaving the latter two demands unaddressed. We present SurgicalMamba, a causal SPR model built on Mamba2's structured state-space duality (SSD) that holds per-frame cost at O(d). It introduces three SSD-compatible components that jointly address these demands: a dual-path SSD block that separates long- and short-term regimes at the level of recurrent state; intensity-modulated stepping, a continuous-time time-warp that adapts the slow path's effective rate to phase-relevant information; and state regramming, a per-chunk Cayley rotation that opens cross-channel mixing in the otherwise axis-aligned SSM recurrence. The learned rotation planes inherit a phase-aligned structure without any direct supervision, offering an interpretable internal signature of surgical workflow. Across seven public SPR benchmarks, SurgicalMamba reaches state-of-the-art accuracy and phase-level Jaccard under strict online evaluation: 94.6%/82.7% on Cholec80 (+0.7 pp/+2.2 pp over the strongest prior) and 89.5%/68.9% on AutoLaparo (+1.7 pp/+2.0 pp), at 238.74 fps on a single GPU. Ablations isolate the contribution of each component. The code is publicly available at https://github.com/sukjuoh/Surgical-Mamba.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Surgical Phase Recognition | Cholec80 | -- | 65 | |
| Surgical Phase Recognition | Cataract-101 | Accuracy96.9 | 20 | |
| Surgical Phase Recognition | m2cai16 | Accuracy92.2 | 7 | |
| Surgical Phase Recognition | Heidelberg (HeiCo) | Accuracy72.1 | 6 | |
| Surgical Phase Recognition | HeiChole 24-video (12:6:6) | Accuracy86.4 | 6 | |
| Surgical Workflow Analysis | Cholec80 (test) | Speed (fps)119.1 | 4 |