Seizure-Semiology-Suite (S3): A Clinically Multimodal Dataset, Benchmark, and Models for Seizure Semiology Understanding
About
While Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in general video understanding, their capacity to interpret involuntary, and spatio-temporally evolving pathologic motor behaviors such as seizure semiology remains largely untested. To address this gap, we introduce Seizure-Semiology-Suite, a clinically grounded dataset and benchmark for fine-grained, structured seizure semiology understanding. The dataset includes 438 seizure videos annotated with over 35,000 dense labels covering 20 ILAE-defined semiological features. Building on this dataset, we propose a seven-task hierarchical benchmark that systematically evaluates MLLMs from low-level visual perception to temporal sequencing, narrative report generation, and seizure diagnosis. To enable clinically meaningful evaluation of generated reports, we further introduce the Report Quality Index for Seizure Semiology (Seizure-RQI). Extensive baselines across 11 open-weight MLLMs reveal systematic weaknesses in laterality reasoning, temporal localization, symptom sequencing, and clinically faithful reporting. We show that seizure-specific fine-tuning substantially improves performance across tasks, and that a two-stage neuro-symbolic framework achieves an F1 score of 0.96 on epileptic versus non-epileptic seizure classification. Seizure-Semiology-Suite establishes a rigorous benchmark for evaluating multimodal models in safety-critical medical video understanding and guides the development of clinically reliable, domain-adaptive multimodal intelligence.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Clinical Diagnosis | S3 1.0 (test) | Precision93 | 36 | |
| Seizure Sequence Recognition (Task 5) | Seizure Semiology Suite (S3) | Edit Distance7.47 | 12 | |
| Temporal Seizure Localization (Task 4) | Seizure Semiology Suite (S3) | Start Time MAE15.66 | 12 | |
| Narrative report generation | S3 1.0 (test) | RQI Score36.44 | 12 | |
| Seizure Semiology Classification (Task 3) | Seizure Semiology Suite (S3) | Head Turning Precision0.00e+0 | 12 | |
| Seizure Semiology Recognition | Seizure-Semiology-Suite (S3) Whole Dataset 1.0 | -- | 10 |