DiSCTT: Consensus-Guided Self-Curriculum for Efficient Test-Time Adaptation in Reasoning

About

Test-time adaptation offers a promising avenue for improving reasoning performance in large language models without additional supervision, but existing approaches often apply a uniform optimization objective across all inputs, leading to inefficient or unstable adaptation on heterogeneous reasoning problems. We propose DiSCTT, a difficulty-aware, consensus-guided self-curriculum framework that dynamically allocates test-time optimization strategies based on instance-level epistemic uncertainty estimated from agreement among sampled reasoning trajectories. Inputs with high consensus are consolidated via supervised fine-tuning using majority-agreed solutions as pseudo-labels, while low-consensus inputs are optimized via reinforcement learning with a consensus-regularized objective that encourages diversity under relevance constraints. Across a broad suite of mathematical and general reasoning benchmarks, DiSCTT consistently outperforms strong test-time adaptation baselines, achieving higher accuracy with reduced variance and substantially lower computation and wall-clock training times. These results demonstrate that explicitly accounting for instance difficulty and uncertainty enables more stable, efficient, and effective test-time adaptation for reasoning models.

Mohammad Mahdi Moradi, Sudhir Mudur• 2026

Related benchmarks

Task	Dataset	Result
General Reasoning	MMLU	MMLU Accuracy83.3	180
General Reasoning	GPQA	Accuracy38.4	59
Mathematical Reasoning	MATH 500	Mean@10.822	55
Mathematical Reasoning	AMC	Mean Accuracy59.5	24
Mathematical Reasoning	AIME 2024	Mean Accuracy29.6	24
Multi-hop Question Answering	HotpotQA	Mean Accuracy73.7	24

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord