Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

About

Effective tool use and reasoning are essential capabilities for large reasoning models~(LRMs) to address complex real-world problems. Through empirical analysis, we identify that current LRMs lack the capability of sub-task decomposition in complex tool use scenarios, leading to Lazy Reasoning. To address this, we propose a two-stage training framework D-CORE~(\underline{\textbf{D}}ecomposing tasks and \underline{\textbf{Co}}mposing \underline{\textbf{Re}}asoning processes) that first incentivize the LRMs' task decomposition reasoning capability via self-distillation, followed by diversity-aware reinforcement learning~(RL) to restore LRMs' reflective reasoning capability. D-CORE achieves robust tool-use improvements across diverse benchmarks and model scales. Experiments on BFCLv3 demonstrate superiority of our method: D-CORE-8B reaches 77.7\% accuracy, surpassing the best-performing 8B model by 5.7\%. Meanwhile, D-CORE-14B establishes a new state-of-the-art at 79.3\%, outperforming 70B models despite being 5$\times$ smaller. The source code is available at https://github.com/alibaba/EfficientAI.

Bowen Xu, Shaoyu Wu, Hao Jiang, Kai Liu, Xin Chen, Lulu Hu, Bin Yang• 2026

Related benchmarks

TaskDatasetResultRank
Function CallingBFCL V3
Overall Accuracy79.3
88
Tool Use Reasoning∞Bench
Avg Accuracy51.3
14
Tool UseACEBench-en (out-of-distribution)
Normal Score77.9
8
Tool UseBFCL Agentic v4 (out-of-distribution)
Web-base Score39
8
Tool Useτ²-Bench (out-of-distribution)
Retail Score53.5
8
Showing 5 of 5 rows

Other info

GitHub

Follow for update