Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SelRoute: Query-Type-Aware Routing for Long-Term Conversational Memory Retrieval

About

Retrieving relevant past interactions from long-term conversational memory typically relies on large dense retrieval models (110M-1.5B parameters) or LLM-augmented indexing. We introduce SelRoute, a framework that routes each query to a specialized retrieval pipeline -- lexical, semantic, hybrid, or vocabulary-enriched -- based on its query type. On LongMemEval_M (Wu et al., 2024), SelRoute achieves Recall@5 of 0.800 with bge-base-en-v1.5 (109M parameters) and 0.786 with bge-small-en-v1.5 (33M parameters), compared to 0.762 for Contriever with LLM-generated fact keys. A zero-ML baseline using SQLite FTS5 alone achieves NDCG@5 of 0.692, already exceeding all published baselines on ranking quality -- a gap we attribute partly to implementation differences in lexical retrieval. Five-fold stratified cross-validation confirms routing stability (CV gap of 1.3-2.4 Recall@5 points; routes stable for 4/6 query types across folds). A regex-based query-type classifier achieves 83% effective routing accuracy, and end-to-end retrieval with predicted types (Recall@5 = 0.689) still outperforms uniform baselines. Cross-benchmark evaluation on 8 additional benchmarks spanning 62,000+ instances -- including MSDialog, LoCoMo, QReCC, and PerLTQA -- confirms generalization without benchmark-specific tuning, while exposing a clear failure mode on reasoning-intensive retrieval (RECOR Recall@5 = 0.149) that bounds the claim. We also identify an enrichment-embedding asymmetry: vocabulary expansion at storage time improves lexical search but degrades embedding search, motivating per-pipeline enrichment decisions. The full system requires no GPU and no LLM inference at query time.

Matthew McKee• 2026

Related benchmarks

TaskDatasetResultRank
RetrievalLongMemEval-S
Recall@592
17
RetrievalLongMemEval-M
Recall@577.4
10
Information RetrievalLocomo
R@576.7
8
RetrievalLongMemEval_M session-level granularity binary all-or-nothing recall
Recall@580
8
RetrievalLongMemEval (session-level)
Ra@580
8
RetrievalQReCC--
8
RetrievalMSDialog
R@599.8
1
RetrievalPerLTQA
Ra@574.5
1
RetrievalEpisodic Memory
Recall@566.7
1
RetrievalLMEB
Recall@597.7
1
Showing 10 of 11 rows

Other info

Follow for update