Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Logic-Regularized Verifier Elicits Reasoning from LLMs

About

Verifiers are crucial components for enhancing modern LLMs' reasoning capability. Typicalverifiers require resource-intensive superviseddataset construction, which is costly and faceslimitations in data diversity. In this paper, wepropose LOVER, an unsupervised verifier regularized by logical rules. LOVER treats theverifier as a binary latent variable, utilizinginternal activations and enforcing three logical constraints on multiple reasoning paths:negation consistency, intra-group consistency,and inter-group consistency (grouped by thefinal answer). By incorporating logical rulesas priors, LOVER can leverage unlabeled examples and is directly compatible with any offthe-shelf LLMs. Experiments on 10 datasetsdemonstrate that LOVER significantly outperforms unsupervised baselines, achieving performance comparable to the supervised verifier(reaching its 95% level on average). The sourcecode is publicly available at https://github.com/wangxinyufighting/llm-lover.

Xinyu Wang, Changzhi Sun, Lian Cheng, Yuanbin Wu, Dell Zhang, Xiaoling Wang, Xuelong Li• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy (Acc)92.11
337
Knowledge ReasoningMMLU-Pro
Accuracy61.24
120
Open-domain Question AnsweringHotpotQA
Accuracy83.8
73
Mathematical ReasoningiGSM
Accuracy54.25
21
Logical reasoningBIG-Bench Hard Web of Lies (OOD)
Accuracy44.8
3
Mathematical ReasoningBIG-Bench Hard Multi-Step Arithmetic Two (OOD)
Accuracy12.4
3
ReasoningBIG-Bench Hard Object Counting (OOD)
Accuracy48.4
3
Spatial ReasoningBIG-Bench Hard Navigate (OOD)
Accuracy57.6
3
Causal ReasoningBIG-Bench Hard Causal Judgment (OOD)
Accuracy60.4
3
Logical reasoningBIG-Bench Hard Boolean Expression (OOD)
Accuracy72.4
3
Showing 10 of 10 rows

Other info

Follow for update