MetaCrit: A Critical Thinking Framework for Self-Regulated LLM Reasoning
About
Large language models (LLMs) fail on over one-third of multi-hop questions with counterfactual premises and remain vulnerable to adversarial prompts that trigger biased or factually incorrect responses, which exposes a fundamental deficit in self-regulated reasoning. We propose \textbf{MetaCrit}, a multi-agent framework grounded in Nelson and Narens' metacognitive regulation theory. MetaCrit decomposes reasoning regulation into four agents: object-level generation, a \emph{monitoring} agent that assesses response validity, a \emph{control} agent that critiques logical soundness, and a meta-level synthesizer that integrates all signals into a final response. Evaluation across eight benchmarks, four model backbones, and a college-level analytical writing study shows that MetaCrit significantly improves content truthfulness and logical soundness while eliminating toxic outputs. Its modular design allows individual agents to be integrated into existing frameworks as drop-in components without architectural modifications.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Truthfulness | TruthfulQA | Truthfulness Accuracy97.55 | 86 | |
| Toxicity Evaluation | BoLD | Toxic Rate0.00e+0 | 26 | |
| Logical Coherence | CIAR | Accuracy96 | 12 | |
| Safety Evaluation | HONEST | Score0.00e+0 | 12 | |
| Analytical and personal anecdote writing | User study n=45 | Preference Rate (Critical Thinking)41.7 | 3 |