Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking

About

Recent studies have shown that large language models (LLMs), especially smaller ones, often lack robustness in grade school math (GSM) reasoning. In particular, they tend to experience performance drops when faced with distribution shifts, such as changes to numerical or nominal variables, or insertions of distracting clauses. A possible strategy to address this involves generating synthetic data to further "instantiate" reasoning problems on potential variations. In this work, we instead focus on the strategy of "abstracting" reasoning problems. This not only helps counteract distribution shifts but also facilitates the connection to symbolic tools for deriving solutions. Focusing on GSM, we find that this abstraction process is better acquired through reinforcement learning (RL) than just supervised fine-tuning, which often fails to produce faithful abstractions. Our method, AbstRaL -- which promotes abstract reasoning in LLMs using RL on granular abstraction data -- significantly mitigates performance degradation on recent GSM perturbation benchmarks. Besides, improving GSM robustness via AbstRaL is shown to also implicitly benefit LLMs' capabilities on OOD mathematical and general reasoning tasks, indicating that abstract thinking broadly enables better generalizability.

Silin Gao, Antoine Bosselut, Samy Bengio, Emmanuel Abbe• 2025

Related benchmarks

TaskDatasetResultRank
Question AnsweringARC Challenge
Accuracy52.3
749
ReasoningBBH
Accuracy54.8
507
Mathematical ReasoningASDIV
Accuracy0.953
221
Mathematical ReasoningMAWPS
Accuracy98.5
219
Mathematical ReasoningMATH
Accuracy83.9
162
Mathematical ReasoningCollegeMATH
Accuracy47.2
161
Mathematical ReasoningTabMWP
Accuracy93.9
157
Mathematical ReasoningAQUA
Accuracy74.8
132
Mathematical ReasoningSAT Math
SAT Math Accuracy93.8
44
Mathematical ReasoningGSM-Symbolic Vary Num.
Accuracy90.66
36
Showing 10 of 18 rows

Other info

Follow for update