HyEm: Query-Adaptive Hyperbolic Retrieval for Biomedical Ontologies via Euclidean Vector Indexing
About
Retrieval-augmented generation (RAG) for biomedical knowledge faces a hierarchy-aware ontology grounding challenge: resources like HPO, DO, and MeSH use deep ``is-a" taxonomies, yet production stacks rely on Euclidean embeddings and ANN indexes. While hyperbolic embeddings suit hierarchical representation, they face two barriers: (i) lack of native vector database support, and (ii) risk of underperforming on entity-centric queries where hierarchy is irrelevant. We present HyEm, a lightweight retrieval layer integrating hyperbolic ontology embeddings into existing Euclidean ANN infrastructure. HyEm learns radius-controlled hyperbolic embeddings, stores origin log-mapped vectors in standard Euclidean databases for candidate retrieval, then applies exact hyperbolic reranking. A query-adaptive gate outputs continuous mixing weights, combining Euclidean semantic similarity with hyperbolic hierarchy distance at reranking time. Our bi-Lipschitz analysis under radius constraints provides practical guidance for ANN oversampling and dimensionality.Experiments on biomedical ontology subsets demonstrate HyEm preserves 94-98% of Euclidean baseline performance on entity-centric queries while substantially improving hierarchy-navigation and mixed-intent queries, maintaining indexability at moderate oversampling.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Taxonomy-navigation retrieval | HPO-5k | Parent Hits@516.4 | 6 | |
| Taxonomy-navigation retrieval | DO-5k | Parent Hits@512.8 | 6 | |
| Entity-centric retrieval | HPO-5k seed=0 | Hits@173.6 | 5 | |
| Entity-centric retrieval | DO 5k seed=0 | Hits@159.8 | 5 | |
| Ontology Retrieval | HPO-20k | Q-E Retention95.9 | 1 | |
| Ontology Retrieval | DO 20k | Q-E Retention96.4 | 1 |