CodeGENCAT: Generative Computerized Adaptive Testing for Open-ended Coding Problems

About

Existing Computerized Adaptive Testing (CAT) frameworks typically select questions based on the predicted likelihood that the student will answer correctly. This design ignores information contained in students' open-ended responses, especially in domains such as programming education, where code structures and bugs contain rich information on student knowledge. In this work, we propose \textbf{Code} \textbf{GEN}erative \textbf{CAT} (\textbf{CodeGENCAT}), a generative CAT framework that selects questions using predicted student code responses. First, we develop a Generative Item Response Theory (GIRT) model that generates code responses conditioned on estimated student knowledge, trained with supervised fine-tuning followed by direct preference optimization for knowledge-response alignment. Second, we introduce three question-selection algorithms that measure uncertainty, coding style diversity, and information from predicted student code responses. Experiments on two real-world programming education datasets show that CodeGENCAT outperforms all CAT baselines, achieving an AUC improvement of up to 4.32\% over the strongest baseline in the early stages of adaptive testing.

Wanyong Feng, Alexander Scarlatos, Ruochen Sun, Andrew Lan• 2026

Related benchmarks

Task	Dataset	Result
Code Similarity	CodeWorkout (5-fold averaged)	CodeBLEU0.593	49
Knowledge Estimation	CodeWorkout	Accuracy72.32	49
Knowledge Estimation	ProgFeed	Accuracy77.6	49
Computerized Adaptive Testing	CodeWorkout (test)	Exp.% (med)15.6	7

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord