Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning

About

Numerical reasoning over expert-domain tables often exhibits high in-domain accuracy but limited robustness to domain shift. Models trained with supervised fine-tuning (SFT) on specific datasets tend to rely on header-operation shortcuts rather than structural reasoning. We introduce TaNOS, a continual pre-training framework comprising three components: (i) header anonymization to reduce lexical memorization, (ii) operation sketches that provide minimal structural cues, and (iii) self-supervised pretraining that constructs correctness-guaranteed program-question pairs from given tables in a program-first manner. By decoupling domain semantics and numerical operation structure, TaNOS improves the transferability of numerical reasoning. Applied to an 8B instruction-tuned model, TaNOS achieves 80.13% execution accuracy on FinQA with only 10% train data, outperforming SFT baseline (73.97%) with full train data and proprietary models such as GPT-5, Gemini-2.5-Pro. Furthermore, in the domain-shift experiments, TaNOS displays nearly-negligible cross-domain gap (<2pp) when standard SFT shows over 10pp gap. These results suggest that structural guidance with operation sketches, header-agnostic representations, and correctness-guaranteed self-supervision can improve the robustness of numerical reasoning across diverse expert-domain tables.

Hanjun Cho, Gahyun Yoo, Hanseong Kim, Jay-Yoon Lee• 2026

Related benchmarks

Task	Dataset	Result
Table Question Answering	Financial TableQA	Execution Accuracy85.51	48
Financial Question Answering	FinQA	Accuracy83.46	30
Program Generation	MultiHiertt	Program Accuracy82.9	10
Program Generation	expert-curated Biology dataset	Program Accuracy70.26	10
Financial Table Question Answering	MultiHiertt	Program Accuracy70.01	4
Financial Table Question Answering	NumReason 500	Program Accuracy83.89	4

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord