Logic-Regularized Verifier Elicits Reasoning from LLMs
About
Verifiers are crucial components for enhancing modern LLMs' reasoning capability. Typicalverifiers require resource-intensive superviseddataset construction, which is costly and faceslimitations in data diversity. In this paper, wepropose LOVER, an unsupervised verifier regularized by logical rules. LOVER treats theverifier as a binary latent variable, utilizinginternal activations and enforcing three logical constraints on multiple reasoning paths:negation consistency, intra-group consistency,and inter-group consistency (grouped by thefinal answer). By incorporating logical rulesas priors, LOVER can leverage unlabeled examples and is directly compatible with any offthe-shelf LLMs. Experiments on 10 datasetsdemonstrate that LOVER significantly outperforms unsupervised baselines, achieving performance comparable to the supervised verifier(reaching its 95% level on average). The sourcecode is publicly available at https://github.com/wangxinyufighting/llm-lover.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K | Accuracy (Acc)92.11 | 337 | |
| Knowledge Reasoning | MMLU-Pro | Accuracy61.24 | 120 | |
| Open-domain Question Answering | HotpotQA | Accuracy83.8 | 73 | |
| Mathematical Reasoning | iGSM | Accuracy54.25 | 21 | |
| Logical reasoning | BIG-Bench Hard Web of Lies (OOD) | Accuracy44.8 | 3 | |
| Mathematical Reasoning | BIG-Bench Hard Multi-Step Arithmetic Two (OOD) | Accuracy12.4 | 3 | |
| Reasoning | BIG-Bench Hard Object Counting (OOD) | Accuracy48.4 | 3 | |
| Spatial Reasoning | BIG-Bench Hard Navigate (OOD) | Accuracy57.6 | 3 | |
| Causal Reasoning | BIG-Bench Hard Causal Judgment (OOD) | Accuracy60.4 | 3 | |
| Logical reasoning | BIG-Bench Hard Boolean Expression (OOD) | Accuracy72.4 | 3 |