The Signal is in the Steps: Local Scoring for Reasoning Data Selection

About

Distilling long-form reasoning from teacher models into smaller students requires selecting which candidate solutions to train on. Recent work argues that one should select responses the student model assigns highest probability, i.e., favoring solutions ``natural'' to the student. However, we find that this approach works within a single teacher but fails when scaling to long reasoning traces from multiple diverse teachers. We identify a key cause: this approach scores entire solutions, but students generalize by recombining familiar reasoning steps, not by memorizing complete solutions. Full-trajectory scoring optimizes the wrong target; it rewards global fluency while the transferable signal lies in local step transitions. We propose Local Average Log Probability (LALP), which scores each reasoning step using only a small window of preceding context, measuring whether each step is justified by its immediate premises rather than whether the full response looks natural to the student. LALP enables two practical use cases: selecting the best teacher before fine-tuning and curating training data from diverse teacher pools. Across math, coding, and science reasoning tasks, LALP consistently improves accuracy when selecting the most natural solutions by a large margin.

Hoang Anh Just, Myeongseob Ko, Ruoxi Jia• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH500 (test)	--	895
Mathematical Reasoning	MATH 500	Accuracy (Acc)85.6	543
Mathematical Reasoning	OlympiadBench	Accuracy67.3	213
Mathematical Reasoning	AIME 2024 (test)	--	209
Mathematical Reasoning	AMC 2025 (test)	Acc@564.26	54
Mathematical Reasoning	GPQA Diamond (test)	Accuracy@556.23	54
Mathematical Reasoning	CN Middle School 24	Accuracy83.3	51
Mathematical Reasoning	OlympiadBench	Accuracy50.14	36
Science Reasoning	GPQA	GPQA Score69.4	27
Mathematical Reasoning	AIME 24	Accuracy61.66	16

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord