Robust Length Prediction: A Perspective from Heavy-Tailed Prompt-Conditioned Distributions
About
Output-length prediction is important for efficient LLM serving, as it directly affects batching, memory reservation, and scheduling. For prompt-only length prediction, most existing methods use a one-shot sampled length as the label, implicitly treating each prompt as if it had one true target length. We show that this is unreliable: even under a fixed model and decoding setup, the same prompt induces a \emph{prompt-conditioned output length distribution}, not a deterministic scalar, and this distribution is consistent with \emph{heavy-tailed} behavior. Motivated by this, we cast length prediction as robust estimation from heavy-tailed prompt-conditioned length distributions. We propose prompt-conditioned length distribution (ProD) methods, which construct training targets from multiple independent generations of the same prompt. Two variants are developed to reuse the served LLM's hidden states: \mbox{ProD-M}, which uses a median-based target for robust point prediction, and ProD-D, which uses a distributional target that preserves prompt-conditioned uncertainty. We provide theoretical justifications by analyzing the estimation error under a surrogate model. Experiments across diverse scenarios show consistent gains in prediction quality.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Output Length Prediction | GSM8K (test) | MAE19.57 | 16 | |
| Output Length Prediction | LongBench (test) | MAE37.68 | 16 | |
| Output Length Prediction | MBPP (test) | MAE26.61 | 16 | |
| Output Length Prediction | LMSYS-Chat-1M (test) | MAE93.39 | 16 |