Fine-Tuning Language Models to Know What They Know

About

Evaluating true metacognition in Large Language Models (LLMs) is difficult due to biases and heuristics. This paper presents a framework to measure and enhance LLM metacognition while controlling for these biases. A measurement method using the $d'_{\rm type2}$ metric is established to isolate metacognitive ability. The Evolution Strategy for Metacognitive Alignment (ESMA) is proposed, demonstrating robust generalization across unseen datasets, languages, and newly acquired knowledge. Finally, parameter analysis reveals that these improvements are driven by a sparse set of parameters, offering new pathways for targeted metacognitive optimization.

Sangjun Park, Elliot Meyerson, Xin Qiu, Risto Miikkulainen• 2026

Related benchmarks

Task	Dataset	Result
Question Answering	NQ-Open (val)	Accuracy29	46
Metacognitive Ability	TriviaQA	d'type21.02	17
Question Answering	Freebase QA (test)	d'type2 Score117	6
Question Answering	Web Questions (test)	d'type20.66	6

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord