SimpleTool: Parallel Decoding for Real-Time LLM Function Calling

About

LLM-based function calling enables intelligent agents to interact with external tools and environments, yet autoregressive decoding imposes a fundamental latency bottleneck that limits real-time applications such as embodied intelligence, game AI, and interactive avatars (e.g., 10 Hz control frequency). We observe that function calling differs fundamentally from free-form text generation: structured outputs exhibit substantial token redundancy (delimiters, parameter names), and arguments exhibit weak causal dependencies. Crucially, these two properties must be exploited jointly to achieve real-time performance. We present SimpleTool, which introduces special tokens that serve a dual role: compressing low-entropy tokens (4-6x reduction) while acting as mode selectors that enable independent parallel generation of function name and arguments. This synergistic design achieves 3-6x end-to-end speedup (up to 9.6x) with only +8.2% parallelization overhead. Experiments on five benchmarks across Qwen-series models (0.5B-14B) demonstrate substantial speedup while maintaining competitive or improved accuracy. On Mobile Actions, ST-Qwen-0.5B outperforms Google's FunctionGemma in both accuracy and latency consistency. With quantization on consumer-grade GPU, SimpleTool achieves 61.2ms P50 latency, enabling 16 Hz real-time control at 4B model scale, bridging the gap between LLM function calling and latency-critical real-world deployment.

Xiaoxin Shi, Jiaxin Wan, Linkang Dong, Wei Jiang, Yue Liu, Zengfeng Huang• 2026

Related benchmarks

Task	Dataset	Result
Function Calling	Mobile Actions	Overall Accuracy84.5	12
Function Calling	Others (SealTools, OpenFunc, ToolAlpaca)	Overall Accuracy87.4	12
Function Calling	BFCL Non-Live v3	Overall Accuracy93.5	12
Function Calling	BFCL Live v3	Overall Accuracy76.4	12
Function Calling	BFCL Exec v3	Overall Accuracy92.6	12
Function Calling	Mobile Actions	Accuracy86.2	5

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord