Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

How to Steal Reasoning Without Reasoning Traces

About

Many large language models (LLMs) use reasoning to generate responses but do not reveal their full reasoning traces (a.k.a. chains of thought), instead outputting only final answers and brief reasoning summaries. To demonstrate that hiding reasoning traces does not prevent users from "stealing" a model's reasoning capabilities, we introduce trace inversion models that, given only the inputs, answers, and (optionally) reasoning summaries exposed by a target model, generate detailed, synthetic reasoning traces. We show that (1) traces synthesized by trace inversion have high overlap with the ground-truth reasoning traces (when available), and (2) fine-tuning student models on inverted traces substantially improves their reasoning and enables distillation from proprietary, black-box LLMs.

Tingwei Zhang, John X. Morris, Vitaly Shmatikov• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 2024
Accuracy11.1
479
Mathematical ReasoningMATH 500
Accuracy77.6
442
Mathematical ReasoningMATH 500
Accuracy72
116
Code ReasoningLiveCodeBench
Accuracy28.9
90
Math ReasoningJEEBench
Accuracy44.9
82
Mathematical ReasoningJEE Math
JEE Math Reasoning Score (s)23.9
13
Downstream AccuracyMATH500
Accuracy71.8
12
Downstream AccuracyJEEBench
Accuracy36.3
12
Downstream AccuracyLiveCodeBench
Accuracy30.9
12
CodingLiveCodeBench
Accuracy29.1
10
Showing 10 of 10 rows

Other info

Follow for update