ProtoSiTex: Learning Semi-Interpretable Prototypes for Multi-label Text Classification

About

The rapid growth of user-generated text across digital platforms has intensified the need for interpretable models capable of fine-grained text classification and explanation. Existing prototype-based models offer intuitive explanations but typically operate at coarse granularity (sentence or document level) and fail to address the multi-label nature of real-world text classification. We propose ProtoSiTex, a semi-interpretable framework designed for fine-grained multi-label text classification. ProtoSiTex employs a dual-phase alternate training strategy: an unsupervised prototype discovery phase that learns semantically coherent and diverse prototypes, and a supervised classification phase that maps these prototypes to class labels. A hierarchical loss function enforces consistency across subsentence, sentence, and document levels, enhancing interpretability and alignment. Unlike prior approaches, ProtoSiTex captures overlapping and conflicting semantics using adaptive prototypes and multi-head attention. We also introduce a benchmark dataset of hotel reviews annotated at the subsentence level with multiple labels. Experiments on this dataset and two public benchmarks (binary and multi-class) show that ProtoSiTex achieves state-of-the-art performance while delivering faithful, human-aligned explanations, establishing it as a robust solution for semi-interpretable multi-label text classification.

Utsav Kumar Nareti, Suraj Kumar, Soumya Pandey, Soumi Chattopadhyay, Chandranath Adak, Sankha Subhra Mullick• 2025

Related benchmarks

Task	Dataset	Result	Rank
Text Classification	TweetEVAL (test)	Accuracy (A)84.17		44
Multi-label Text Classification	Hotel Reviews (HR) (test)	F-Measure87.22		44

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord