Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

About

Large Language Models (LLMs) have demonstrated strong general capabilities, yet their deployment in finance remains challenging due to dense domain-specific terminology, stringent numerical reasoning requirements, and low tolerance for factual errors. We conduct a controlled empirical study showing that in specialized vertical domains, performance is largely determined by the quality and difficulty/verifiability profile of post-training data. We introduce \textbf{ODA-Fin-SFT-318k}, constructed via multi-stage distillation and verification to produce high-quality Chain-of-Thought supervision, and \textbf{ODA-Fin-RL-12k}, curated for hard-but-verifiable tasks that balance reward precision and task diversity. Using standard SFT and RL pipelines, we show that high-quality CoT distillation establishes a robust foundation during SFT, while difficulty- and verifiability-aware sampling improves RL generalization. Evaluated on nine benchmarks spanning general financial tasks, sentiment analysis, and numerical reasoning, our ODA-Fin-RL-8B consistently surpasses open-source state-of-the-art (SOTA) financial LLMs of comparable size. We release our ODA-Fin-SFT-318k and ODA-Fin-RL-12k datasets, along with trained models to advance data-centric financial AI research.

Chuxue Cao, Honglin Lin, Zhanping Zhong, Xin Gao, Mengzhang Cai, Conghui He, Sirui Han, Lijun Wu• 2026

Related benchmarks

Task	Dataset	Result
Financial Reasoning	FinQA	Accuracy73.3	69
Sentiment Analysis	FOMC	--	44
Financial Reasoning	ConvFinQA	Accuracy80.4	23
Financial Knowledge	Fineval	AVG Score74.6	16
Sentiment Analysis	FPB	Weighted F10.834	15
Sentiment Analysis	Headlines	Weighted F178.5	15
Financial Knowledge	FinanceIQ	Accuracy74.2	15
Numerical Reasoning	TATQA	Accuracy89.3	14
Financial Knowledge	Finova	Accuracy54.6	14

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord