Logic-Regularized Verifier Elicits Reasoning from LLMs

About

Verifiers are crucial components for enhancing modern LLMs' reasoning capability. Typicalverifiers require resource-intensive superviseddataset construction, which is costly and faceslimitations in data diversity. In this paper, wepropose LOVER, an unsupervised verifier regularized by logical rules. LOVER treats theverifier as a binary latent variable, utilizinginternal activations and enforcing three logical constraints on multiple reasoning paths:negation consistency, intra-group consistency,and inter-group consistency (grouped by thefinal answer). By incorporating logical rulesas priors, LOVER can leverage unlabeled examples and is directly compatible with any offthe-shelf LLMs. Experiments on 10 datasetsdemonstrate that LOVER significantly outperforms unsupervised baselines, achieving performance comparable to the supervised verifier(reaching its 95% level on average). The sourcecode is publicly available at https://github.com/wangxinyufighting/llm-lover.

Xinyu Wang, Changzhi Sun, Lian Cheng, Yuanbin Wu, Dell Zhang, Xiaoling Wang, Xuelong Li• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy (Acc)92.11	352
Knowledge Reasoning	MMLU-Pro	Accuracy61.24	148
Open-domain Question Answering	HotpotQA	Accuracy83.8	103
Mathematical Reasoning	iGSM	Accuracy54.25	21
Logical reasoning	BIG-Bench Hard Web of Lies (OOD)	Accuracy44.8	3
Mathematical Reasoning	BIG-Bench Hard Multi-Step Arithmetic Two (OOD)	Accuracy12.4	3
Reasoning	BIG-Bench Hard Object Counting (OOD)	Accuracy48.4	3
Spatial Reasoning	BIG-Bench Hard Navigate (OOD)	Accuracy57.6	3
Causal Reasoning	BIG-Bench Hard Causal Judgment (OOD)	Accuracy60.4	3
Logical reasoning	BIG-Bench Hard Boolean Expression (OOD)	Accuracy72.4	3

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord