Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Model Compression with Exact Budget Constraints via Riemannian Manifolds

About

Assigning one of K options to each of N groups under a total cost budget is a recurring problem in efficient AI, including mixed-precision quantization, non-uniform pruning, and expert selection. The objective, typically model loss, depends jointly on all assignments and does not decompose across groups, preventing combinatorial solvers from directly optimizing the true objective and forcing reliance on proxy formulations. Methods such as evolutionary search evaluate the actual loss but lack gradient information, while penalty-based approaches enforce the budget only approximately and often require extensive hyperparameter tuning. We present a new approach by showing that, under softmax relaxation, the budget constraint defines a smooth Riemannian manifold in logit space with unusually simple geometry. The normal vector admits a closed-form expression, shifting logits along the cost vector changes expected cost monotonically, and vector transport reduces to a single inner product. Building on these properties, we propose Riemannian Constrained Optimization (RCO), which augments a standard Adam step with tangent projection, binary-search retraction, and momentum transport. Combined with Gumbel straight-through estimation and budget-constrained dynamic programming for discrete feasibility, RCO enables first-order optimization of the actual loss under exact budget enforcement without introducing constraint-specific hyperparameters. Across both synthetic benchmarks and realistic LLM compression settings, RCO matches or exceeds state-of-the-art methods while often requiring substantially less wall-clock time. Source code is available at https://github.com/IST-DASLab/RCO.

Michael Helcig, Dan Alistarh• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingC4
Perplexity17.4
1688
Commonsense ReasoningHellaSwag
HellaSwag Accuracy75.7
711
Multiple-choice Question AnsweringMMLU
Accuracy73.3
210
Language ModelingFineWeb-Edu
PPL11.14
141
Coreference ResolutionWinoGrande
Accuracy71.4
61
Boolean Question AnsweringBoolQ
Acc (Normalized)88.5
20
Aggregated LLM Evaluation8 Standard Benchmarks Aggregate
Average Accuracy71
5
General Language Understanding and ReasoningGeneral LLM Evaluation Suite ARC-C ARC-E BoolQ HellaSwag MMLU OBQA RTE WinoGrande
ARC-Challenge Accuracy58.4
5
Showing 8 of 8 rows

Other info

Follow for update