Automated Knowledge Component Generation for Interpretable Knowledge Tracing in Coding Problems

About

Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills thereby facilitating personalized learning and feedback in online learning platforms. However, crafting and tagging KCs to problems, traditionally performed by human domain experts, is highly labor intensive. We present an automated, LLM-based pipeline for KC generation and tagging for open-ended programming problems. We also develop an LLM-based knowledge tracing (KT) framework to leverage these LLM-generated KCs, which we refer to as KCGen-KT. We conduct extensive quantitative and qualitative evaluations on two real-world student code submission datasets in different programming languages.We find that KCGen-KT outperforms existing KT methods and human-written KCs on future student response prediction. We investigate the learning curves of generated KCs and show that LLM-generated KCs result in a better fit than human written KCs under a cognitive model. We also conduct a human evaluation with course instructors to show that our pipeline generates reasonably accurate problem-KC mappings.

Zhangqi Duan, Nigel Fernandez, Arun Balajiee Lekshmi Narayanan, Mohammad Hassany, Rafaella Sampaio de Alencar, Peter Brusilovsky, Bita Akram, Andrew Lan• 2025

Related benchmarks

Task	Dataset	Result
Knowledge Tracing Correctness Prediction	CodeWorkout Java	AUC0.816	6
Knowledge Tracing Correctness Prediction	FalconCode Python	AUC77.1	6
Code Prediction	CodeWorkout Java	CodeBLEU58	3
Code Prediction	FalconCode Python	CodeBLEU49.8	3

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord