Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Compact Representations of LLM Abilities via Item Response Theory

About

Recent years have witnessed a surge in the number of large language models (LLMs), yet efficiently managing and utilizing these vast resources remains a significant challenge. In this work, we explore how to learn compact representations of LLM abilities that can facilitate downstream tasks, such as model routing and performance prediction on new benchmarks. We frame this problem as estimating the probability that a given model will correctly answer a specific query. Inspired by the item response theory (IRT) in psychometrics, we model this probability as a function of three key factors: (i) the model's multi-skill ability vector, (2) the query's discrimination vector that separates models of differing skills, and (3) the query's difficulty scalar. To learn these parameters jointly, we introduce a Mixture-of-Experts (MoE) network that couples model- and query-level embeddings. Extensive experiments demonstrate that our approach leads to state-of-the-art performance in both model routing and benchmark accuracy prediction. Moreover, analysis validates that the learned parameters encode meaningful, interpretable information about model capabilities and query characteristics.

Jianhao Chen, Chenxu Wang, Gengrui Zhang, Peng Ye, Lei Bai, Wei Hu, Yuzhong Qu, Shuyue Hu• 2025

Related benchmarks

TaskDatasetResultRank
Correctness PredictionOverall Combined Datasets
Accuracy70.12
18
Correctness PredictionMMLU
Accuracy65.83
18
Correctness PredictionLogiQA
Accuracy65.07
18
Correctness PredictionMedQA
Accuracy61.06
18
Correctness PredictionMathQA
Accuracy65.85
18
Correctness PredictionPIQA
Accuracy77.43
18
Model RoutingModel Routing Suite MathQA, LogiQA, MedQA, PIQA, TruthQA, MMLU, GSM8k, GPQA, ASDiv, SoQA
Overall Accuracy63.37
18
Correctness PredictionASDIV
Accuracy96.22
18
Correctness PredictionTruthQA
Accuracy65.69
18
Correctness PredictionGPQA
Accuracy79.02
18
Showing 10 of 14 rows

Other info

Follow for update