Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders

About

Model internals encode rich information about how a large language model (LLM) processes its training data; however, post-training data engineering largely relies on external signals and ignores rich intrinsic signals lying in model internals. We propose SAERL, a data engineering framework for LLM reinforcement learning (RL). It models three intrinsic data properties: diversity, difficulty, and quality, using model internals extracted with Sparse Autoencoder (SAE), an advanced mechanistic interpretability tool. Each property grounds a concrete data engineering operation: SAE-space clustering with moderate batch mixing for batch diversity control, a difficulty proxy for easy-to-hard curriculum ordering, and a quality probe for data filtering. SAERL improves average accuracy by 3.00% over vanilla GRPO and reaches target accuracy with 20% fewer training steps on Qwen2.5-Math-1.5B, with consistent gains across model scales and RL algorithms. Experiments show that SAE transfers effectively across model families and scales, serving as a lightweight and reusable data engineering tool. These results demonstrate that model internals are a powerful and practical source of signals for post-training data engineering.

Yi Jing, Zao Dai, Jinwu Hu, Zijun Yao, Lei Hou, Juanzi Li, Xiaozhi Wang• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH500 (test)--
895
Mathematical ReasoningAIME 2024 (test)--
209
Mathematical ReasoningOlympiadBench (test)--
40
Mathematical ReasoningGSM8K (test)
Accuracy (Avg@8)91.5
12
Mathematical ReasoningAMC23
Training Steps100
12
Mathematical ReasoningMinervaMath
Steps to Target Accuracy220
12
Mathematical ReasoningAggregate AIME, AMC, GSM8K, MATH, MNV, OLPD
Avg Training Steps to Target Acc173
12
Mathematical ReasoningAIME 24
Training Steps20
12
Mathematical ReasoningGSM8K
Training Steps to Target Accuracy200
12
Mathematical ReasoningAMC 2023 (test)
Avg@8 Success Rate68.4
12
Showing 10 of 12 rows

Other info

Follow for update