Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OneComp: One-Line Revolution for Generative AI Model Compression

About

Deploying foundation models is increasingly constrained by memory footprint, latency, and hardware costs. Post-training compression can mitigate these bottlenecks by reducing the precision of model parameters without significantly degrading performance; however, its practical implementation remains challenging as practitioners navigate a fragmented landscape of quantization algorithms, precision budgets, data-driven calibration strategies, and hardware-dependent execution regimes. We present OneComp, an open-source compression framework that transforms this expert workflow into a reproducible, resource-adaptive pipeline. Given a model identifier and available hardware, OneComp automatically inspects the model, plans mixed-precision assignments, and executes progressive quantization stages, ranging from layer-wise compression to block-wise refinement and global refinement. A key architectural choice is treating the first quantized checkpoint as a deployable pivot, ensuring that each subsequent stage improves the same model and that quality increases as more compute is invested. By converting state-of-the-art compression research into an extensible, open-source, hardware-aware pipeline, OneComp bridges the gap between algorithmic innovation and production-grade model deployment.

Yuma Ichikawa, Keiji Kimura, Akihiro Yoshida, Yudai Fujimoto, Hiroki Tokura, Yamato Arai, Yoshiyuki Ishii, Yusei Kawakami, Genki Shikada, Achille Jacquemond, Yoshihiko Fujisawa, Katsuki Fujisawa, Takumi Honda, Akira Sakai• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2
Perplexity (PPL)6.67
1624
Zero-shot EvaluationZero-shot Tasks Average
Accuracy57.2
95
Zero-shot ClassificationARC-Challenge, ARC-Easy, PIQA, WinoGrande Average
Accuracy73.9
27
Showing 3 of 3 rows

Other info

Follow for update