QIME: Constructing Interpretable Medical Text Embeddings via Ontology-Grounded Questions

About

While dense biomedical embeddings achieve strong performance, their black-box nature limits their utility in clinical decision-making. Recent question-based interpretable embeddings represent text as binary answers to natural-language questions, but these approaches often rely on heuristic or surface-level contrastive signals and overlook specialized domain knowledge. We propose QIME, an ontology-grounded framework for constructing interpretable medical text embeddings in which each dimension corresponds to a clinically meaningful yes/no question. By conditioning on cluster-specific medical concept signatures, QIME generates semantically atomic questions that capture fine-grained distinctions in biomedical text. Furthermore, QIME supports a training-free embedding construction strategy that eliminates per-question classifier training while further improving performance. Experiments across biomedical semantic similarity, clustering, and retrieval benchmarks show that QIME consistently outperforms prior interpretable embedding methods and substantially narrows the gap to strong black-box biomedical encoders, while providing concise and clinically informative explanations.

Yixuan Tang, Zhenghong Lin, Yandong Sun, Wynne Hsu, Mong Li Lee, Anthony K.H. Tung• 2026

Related benchmarks

Task	Dataset	Result
Semantic Textual Similarity	BIOSSES	Spearman Correlation79.66	55
Information Retrieval	COVID	nDCG@1064.65	50
Information Retrieval	NFCorpus	nDCG@1025.09	33
Information Retrieval	MedQA	nDCG@1062.36	23
Clustering	BiorxivClustering S2S	V-Measure36.83	18
Clustering	MedrxivClusteringP2P (MedP2P)	V-Measure33.92	18
Clustering	MedrxivClustering S2S	V-Measure32	18
Information Retrieval	PHQA	nDCG@1075.64	18
Clustering	ClusTREC-Covid	V-Measure81.99	18
Information Retrieval	R2-IYI	nDCG@1011.79	18

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord