SpecSteer: Synergizing Local Context and Global Reasoning for Efficient Personalized Generation
About
Realizing personalized intelligence faces a core dilemma: sending user history to centralized large language models raises privacy concerns, while on-device small language models lack the reasoning capacity required for high-quality generation. Our pilot study shows that purely local enhancements remain insufficient to reliably bridge this gap. We therefore propose SpecSteer, an asymmetric collaborative inference framework that synergizes private on-device context with cloud-scale reasoning. SpecSteer casts collaboration as Bayesian knowledge fusion and repurposes speculative decoding as a distributed alignment protocol, yielding a Draft--Verify--Recover pipeline: the on-device model drafts personalized sequences; the cloud validates via a ratio-based mechanism that decouples reasoning verification from private context, filtering logical flaws without accessing raw user context; upon rejection, a steering recovery injects local intent during correction. Experiments demonstrate that SpecSteer successfully closes the reasoning gap and achieves superior personalized generation performance, while delivering a 2.36x speedup over standard baselines.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Abstractive Summarization | LongLaMP Abs | ROUGE-143.91 | 16 | |
| Personalized Writing | LongLaMP Wri | R1 Score30.79 | 16 | |
| Review Generation | LongLaMP Rev | R1 Score34.81 | 16 | |
| Personalized Generation | LongLaMP (Pair A) - Abstract (test) | ROUGE-141.35 | 8 | |
| Personalized Generation | LongLaMP (Pair A) - Review (test) | ROUGE-133.03 | 8 | |
| Personalized Generation | LongLaMP Pair A Writing (test) | ROUGE-130.79 | 8 |