Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CodeGENCAT: Generative Computerized Adaptive Testing for Open-ended Coding Problems

About

Existing Computerized Adaptive Testing (CAT) frameworks typically select questions based on the predicted likelihood that the student will answer correctly. This design ignores information contained in students' open-ended responses, especially in domains such as programming education, where code structures and bugs contain rich information on student knowledge. In this work, we propose \textbf{Code} \textbf{GEN}erative \textbf{CAT} (\textbf{CodeGENCAT}), a generative CAT framework that selects questions using predicted student code responses. First, we develop a Generative Item Response Theory (GIRT) model that generates code responses conditioned on estimated student knowledge, trained with supervised fine-tuning followed by direct preference optimization for knowledge-response alignment. Second, we introduce three question-selection algorithms that measure uncertainty, coding style diversity, and information from predicted student code responses. Experiments on two real-world programming education datasets show that CodeGENCAT outperforms all CAT baselines, achieving an AUC improvement of up to 4.32\% over the strongest baseline in the early stages of adaptive testing.

Wanyong Feng, Alexander Scarlatos, Ruochen Sun, Andrew Lan• 2026

Related benchmarks

TaskDatasetResultRank
Code SimilarityCodeWorkout (5-fold averaged)
CodeBLEU0.593
49
Knowledge EstimationCodeWorkout
Accuracy72.32
49
Knowledge EstimationProgFeed
Accuracy77.6
49
Computerized Adaptive TestingCodeWorkout (test)
Exp.% (med)15.6
7
Showing 4 of 4 rows

Other info

Follow for update