Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SimpleTool: Parallel Decoding for Real-Time LLM Function Calling

About

LLM-based function calling enables intelligent agents to interact with external tools and environments, yet autoregressive decoding imposes a fundamental latency bottleneck that limits real-time applications such as embodied intelligence, game AI, and interactive avatars (e.g., 10 Hz control frequency). We observe that function calling differs fundamentally from free-form text generation: structured outputs exhibit substantial token redundancy (delimiters, parameter names), and arguments exhibit weak causal dependencies. Crucially, these two properties must be exploited jointly to achieve real-time performance. We present SimpleTool, which introduces special tokens that serve a dual role: compressing low-entropy tokens (4-6x reduction) while acting as mode selectors that enable independent parallel generation of function name and arguments. This synergistic design achieves 3-6x end-to-end speedup (up to 9.6x) with only +8.2% parallelization overhead. Experiments on five benchmarks across Qwen-series models (0.5B-14B) demonstrate substantial speedup while maintaining competitive or improved accuracy. On Mobile Actions, ST-Qwen-0.5B outperforms Google's FunctionGemma in both accuracy and latency consistency. With quantization on consumer-grade GPU, SimpleTool achieves 61.2ms P50 latency, enabling 16 Hz real-time control at 4B model scale, bridging the gap between LLM function calling and latency-critical real-world deployment.

Xiaoxin Shi, Jiaxin Wan, Linkang Dong, Wei Jiang, Yue Liu, Zengfeng Huang• 2026

Related benchmarks

TaskDatasetResultRank
Function CallingMobile Actions
Overall Accuracy84.5
12
Function CallingOthers (SealTools, OpenFunc, ToolAlpaca)
Overall Accuracy87.4
12
Function CallingBFCL Non-Live v3
Overall Accuracy93.5
12
Function CallingBFCL Live v3
Overall Accuracy76.4
12
Function CallingBFCL Exec v3
Overall Accuracy92.6
12
Function CallingMobile Actions
Accuracy86.2
5
Showing 6 of 6 rows

Other info

Follow for update