Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PrefillShare: A Shared Prefill Module for KV Reuse in Multi-LLM Disaggregated Serving

About

Multi-agent systems increasingly orchestrate multiple specialized language models to solve complex real-world problems, often invoking them over a shared context. This execution pattern repeatedly processes the same prompt prefix across models. Consequently, each model redundantly executes the prefill stage and maintains its own key-value (KV) cache, increasing aggregate prefill load and worsening tail latency by intensifying prefill-decode interference in existing LLM serving stacks. Disaggregated serving reduces such interference by placing prefill and decode on separate GPUs, but disaggregation does not fundamentally eliminate inter-model redundancy in computation and KV storage for the same prompt. To address this issue, we propose PrefillShare, a novel algorithm that enables sharing the prefill stage across multiple models in a disaggregated setting. PrefillShare factorizes the model into prefill and decode modules, freezes the prefill module, and fine-tunes only the decode module. This design allows multiple task-specific models to share a prefill module and the KV cache generated for the same prompt. We further introduce a routing mechanism that enables effective prefill sharing across heterogeneous models in a vLLM-based disaggregated system. PrefillShare not only matches full fine-tuning accuracy on a broad range of tasks and models, but also delivers 4.5x lower p95 latency and 3.9x higher throughput in multi-model agent workloads.

Sunghyeon Woo, Hoseung Kim, Sunghwan Shim, Minjung Jo, Hyunjoon Jeong, Jeongtae Lee, Joonghoon Kim, Sungjae Lee, Baeseong Park, Se Jung Kwon, Dongsoo Lee• 2026

Related benchmarks

TaskDatasetResultRank
Math Word Problem SolvingGSM8K official 1.3k set (test)
Accuracy84.8
53
Code GenerationHumanEval+ v1 (test)
Pass Rate0.805
41
Code GenerationHumanEval v1 (test)
Accuracy86.6
6
Tool CallingBFCL Simple Python v1 (test)
Accuracy93.5
6
Math Word Problem SolvingGSM+ v1 (test)
Accuracy64.5
6
Tool CallingBFCL Multiple v1 (test)
Accuracy91
6
Showing 6 of 6 rows

Other info

Follow for update