Fine-Tuning Language Models to Know What They Know
About
Evaluating true metacognition in Large Language Models (LLMs) is difficult due to biases and heuristics. This paper presents a framework to measure and enhance LLM metacognition while controlling for these biases. A measurement method using the $d'_{\rm type2}$ metric is established to isolate metacognitive ability. The Evolution Strategy for Metacognitive Alignment (ESMA) is proposed, demonstrating robust generalization across unseen datasets, languages, and newly acquired knowledge. Finally, parameter analysis reveals that these improvements are driven by a sparse set of parameters, offering new pathways for targeted metacognitive optimization.
Sangjun Park, Elliot Meyerson, Xin Qiu, Risto Miikkulainen• 2026
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Question Answering | NQ-Open (val) | Accuracy29 | 46 | |
| Metacognitive Ability | TriviaQA | d'type21.02 | 17 | |
| Question Answering | Freebase QA (test) | d'type2 Score117 | 6 | |
| Question Answering | Web Questions (test) | d'type20.66 | 6 |
Showing 4 of 4 rows