ThinknCheck: Grounded Claim Verification with Compact, Reasoning-Driven, and Interpretable Models
About
We present ThinknCheck, a 1B-parameter verifier for grounded claim verification that first produces a short, structured rationale and then a binary verdict. We construct LLMAggreFact-Think, a 24.1k reasoning-augmented training set derived from LLMAggreFact, and fine-tune a 4-bit Gemma3 model to follow this format. On LLMAggreFact, ThinknCheck attains 78.1 balanced accuracy (BAcc), surpassing MiniCheck-7B (77.4) with 7x fewer parameters; removing the reasoning step reduces BAcc to 57.5. On SciFact, ThinknCheck reaches 64.7 BAcc, a +14.7 absolute gain over MiniCheck-7B. By contrast, zero-shot chain-of-thought on the base Gemma3-1B harms accuracy relative to direct answers, and preference optimization with a simple format+accuracy reward underperforms supervised reasoning. To probe the latter, we introduce GSMClaims and a domain-specialized variant, ThinknCheck-Science, which improves across benchmarks, including 61.0\% accuracy on GSMClaims. Overall, explicit, supervised reasoning enables compact verifiers that are competitive while remaining resource-efficient and interpretable.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Scientific Fact Verification | SciFact | -- | 25 | |
| Factuality Evaluation | LLM-AGGREFACT (test) | -- | 13 | |
| Claim Verification | LLMAggreFact (test) | Binary Accuracy78.1 | 9 | |
| Fact Verification | GSMClaims | Accuracy61 | 4 | |
| Claim Verification | SciFact (dev) | BAcc64.7 | 3 |