Coupling Local Context and Global Semantic Prototypes via a Hierarchical Architecture for Rhetorical Roles Labeling

About

Rhetorical Role Labeling (RRL) identifies the functional role of each sentence in a document, a key task for discourse understanding in domains such as law and medicine. While hierarchical models capture local dependencies effectively, they are limited in modeling global, corpus-level features. To address this limitation, we propose two prototype-based methods that integrate local context with global representations. Prototype-Based Regularization (PBR) learns soft prototypes through a distance-based auxiliary loss to structure the latent space, while Prototype-Conditioned Modulation (PCM) constructs corpus-level prototypes and injects them during training and inference. Given the scarcity of RRL resources, we introduce SCOTUS-Law, the first dataset of U.S. Supreme Court opinions annotated with rhetorical roles at three levels of granularity: category, rhetorical function, and step. Experiments on legal, medical, and scientific benchmarks show consistent improvements over strong baselines, with 4 Macro-F1 gains on low-frequency roles. We further analyze the implications in the era of Large Language Models and complement our findings with expert evaluation.

Anas Belfathi, Nicolas Hernandez, Laura Monceaux, Warren Bonnard, Mary Catherine Lavissiere, Christine Jacquin, Richard Dufour• 2026

Related benchmarks

Task	Dataset	Result
Rhetorical Role Labeling	SCOTUS RF	Weighted F1 Score80.92	13
Rhetorical Role Labeling	Pubmed	Macro F188.86	13
Rhetorical Role Labeling	CS-ABSTRACTS	Weighted F178.09	13
Rhetorical Role Labeling	SCOTUSCategory	Macro-F184.13	7
Rhetorical Role Labeling	SCOTUSSteps	Macro-F154.62	7
Rhetorical Role Labeling	LEGALEVAL	Macro F182.5	7
Rhetorical Role Labeling	DEEPRHOLE	Macro-F147.3	7

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord