IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs
About
Routing incoming queries to the most cost-effective LLM while maintaining response quality poses a fundamental challenge in optimizing performance-cost trade-offs for large-scale commercial systems. We present IPR\, -- \,a quality-constrained \textbf{I}ntelligent \textbf{P}rompt \textbf{R}outing framework that dynamically selects optimal models based on predicted response quality and user-specified tolerance levels. IPR introduces three key innovations: (1) a modular architecture with lightweight quality estimators trained on 1.5M prompts annotated with calibrated quality scores, enabling fine-grained quality prediction across model families; (2) a user-controlled routing mechanism with tolerance parameter $\tau \in [0,1]$ that provides explicit control over quality-cost trade-offs; and (3) an extensible design using frozen encoders with model-specific adapters, reducing new model integration from days to hours. To rigorously train and evaluate IPR, we curate an industrial-level dataset IPRBench\footnote{IPRBench will be released upon legal approval.}, a comprehensive benchmark containing 1.5 million examples with response quality annotations across 11 LLM candidates. Deployed on a major cloud platform, IPR achieves 43.9\% cost reduction while maintaining quality parity with the strongest model in the Claude family and processes requests with sub-150ms latency. The deployed system and additional product details are publicly available at https://aws.amazon.com/bedrock/intelligent-prompt-routing/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-task Language Understanding | MMLU | Accuracy92.2 | 321 | |
| Question Answering | TriviaQA | EM24.3 | 182 | |
| Code Generation | HumanEval | Pass@147.4 | 171 | |
| Open-domain Question Answering | Natural Questions (NQ) | Exact Match (EM)57.6 | 74 | |
| Commonsense Reasoning | CommonsenseQA (CSQA) | Accuracy81.7 | 56 | |
| Science Question Answering | ARC | ARC Accuracy97.7 | 46 | |
| Mathematical Problem Solving | MATH | Accuracy93.6 | 32 | |
| Coding | MBPP | Pass@1 Accuracy83.3 | 30 | |
| Aggregate performance evaluation | Aggregate 10-Benchmark Suite | Average Score76.3 | 29 |