Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Fine-Tuning Language Models to Know What They Know

About

Evaluating true metacognition in Large Language Models (LLMs) is difficult due to biases and heuristics. This paper presents a framework to measure and enhance LLM metacognition while controlling for these biases. A measurement method using the $d'_{\rm type2}$ metric is established to isolate metacognitive ability. The Evolution Strategy for Metacognitive Alignment (ESMA) is proposed, demonstrating robust generalization across unseen datasets, languages, and newly acquired knowledge. Finally, parameter analysis reveals that these improvements are driven by a sparse set of parameters, offering new pathways for targeted metacognitive optimization.

Sangjun Park, Elliot Meyerson, Xin Qiu, Risto Miikkulainen• 2026

Related benchmarks

TaskDatasetResultRank
Question AnsweringNQ-Open (val)
Accuracy29
46
Metacognitive AbilityTriviaQA
d'type21.02
17
Question AnsweringFreebase QA (test)
d'type2 Score117
6
Question AnsweringWeb Questions (test)
d'type20.66
6
Showing 4 of 4 rows

Other info

Follow for update