Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning

About

When using agent-task datasets to enhance agent capabilities for Large Language Models (LLMs), current methodologies often treat all tokens within a sample equally. However, we argue that tokens serving different roles - specifically, reasoning tokens versus boilerplate tokens (e.g., those governing output format) - differ significantly in importance and learning complexity, necessitating their disentanglement and distinct treatment. To address this, we propose a novel Shuffle-Aware Discriminator (SHAD) for adaptive token discrimination. SHAD classifies tokens by exploiting predictability differences observed after shuffling input-output combinations across samples: boilerplate tokens, due to their repetitive nature among samples, maintain predictability, whereas reasoning tokens do not. Using SHAD, we propose the Reasoning-highlighted Fine-Tuning (RFT) method, which adaptively emphasizes reasoning tokens during fine-tuning, yielding notable performance gains over common Supervised Fine-Tuning (SFT).

Ziang Ye, Zhenru Zhang, Yang Zhang, Jianxin Ma, Junyang Lin, Fuli Feng• 2024

Related benchmarks

TaskDatasetResultRank
Agent Tool UseStableToolBench Held-In
Pass Rate50.4
14
Agent Tool UseT-eval (Held-Out)
Accuracy71.8
14
Agent Tool UseNexus (Held-Out)
Accuracy32
14
Function CallingBFCL (Held-In)
Accuracy89.4
14
Showing 4 of 4 rows

Other info

Follow for update