One LLM to Train Them All: Multi-Task Learning Framework for Fact-Checking

About

Large language models (LLMs) are reshaping automated fact-checking (AFC) by enabling unified, end-to-end verification pipelines rather than isolated components. While large proprietary models achieve strong performance, their closed weights, complexity, and high costs limit sustainability. Fine-tuning smaller open weight models for individual AFC tasks can help but requires multiple specialized models resulting in high costs. We propose \textbf{multi-task learning (MTL)} as a more efficient alternative that fine-tunes a single model to perform claim detection, evidence ranking, and stance detection jointly. Using small decoder-only LLMs (e.g., Qwen3-4b), we explore three MTL strategies: classification heads, causal language modeling heads, and instruction-tuning, and evaluate them across model sizes, task orders, and standard non-LLM baselines. While multitask models do not universally surpass single-task baselines, they yield substantial improvements, achieving up to \textbf{44\%}, \textbf{54\%}, and \textbf{31\%} relative gains for claim detection, evidence re-ranking, and stance detection, respectively, over zero-/few-shot settings. Finally, we also provide practical, empirically grounded guidelines to help practitioners apply MTL with LLMs for automated fact-checking.

Malin Astrid Larsson, Harald Fosen Grunnaleite, Vinay Setty• 2026

Related benchmarks

Task	Dataset	Result
Claim Detection	Claim Detection	F1 (Task)89.55	9
Evidence Re-Ranking	Evidence Re-Ranking	Rel-F166.13	9
Stance Detection	Stance Detection	Sup-F176.33	9

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord