Coupling Local Context and Global Semantic Prototypes via a Hierarchical Architecture for Rhetorical Roles Labeling
About
Rhetorical Role Labeling (RRL) identifies the functional role of each sentence in a document, a key task for discourse understanding in domains such as law and medicine. While hierarchical models capture local dependencies effectively, they are limited in modeling global, corpus-level features. To address this limitation, we propose two prototype-based methods that integrate local context with global representations. Prototype-Based Regularization (PBR) learns soft prototypes through a distance-based auxiliary loss to structure the latent space, while Prototype-Conditioned Modulation (PCM) constructs corpus-level prototypes and injects them during training and inference. Given the scarcity of RRL resources, we introduce SCOTUS-Law, the first dataset of U.S. Supreme Court opinions annotated with rhetorical roles at three levels of granularity: category, rhetorical function, and step. Experiments on legal, medical, and scientific benchmarks show consistent improvements over strong baselines, with 4 Macro-F1 gains on low-frequency roles. We further analyze the implications in the era of Large Language Models and complement our findings with expert evaluation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Rhetorical Role Labeling | SCOTUS RF | Weighted F1 Score80.92 | 13 | |
| Rhetorical Role Labeling | Pubmed | Macro F188.86 | 13 | |
| Rhetorical Role Labeling | CS-ABSTRACTS | Weighted F178.09 | 13 | |
| Rhetorical Role Labeling | SCOTUSCategory | Macro-F184.13 | 7 | |
| Rhetorical Role Labeling | SCOTUSSteps | Macro-F154.62 | 7 | |
| Rhetorical Role Labeling | LEGALEVAL | Macro F182.5 | 7 | |
| Rhetorical Role Labeling | DEEPRHOLE | Macro-F147.3 | 7 |