Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLMs Are Already Good Tutors: Training-Free Prompt Optimization for Pedagogical Math Tutoring

About

Aligning LLMs for math tutoring typically requires RL-based training with multi-GPU infrastructure. We investigate whether training-free prompt optimization-evolving only the system prompt via API calls-can serve as a practical alternative. We adapt 7 published methods and propose 5 education-specialized methods, evaluating these 12 methods under 5 conditions on 2 OOD benchmark suites. All 12 best-per-method configurations surpass the strongest RL-trained baseline (R_total = 0.633), and our ParetoGrad achieves the best Pareto balance across post-test solve rate, leak control, and helpfulness, rather than dominating any single component. Behavioral analysis with an 82-code educational codebook reveals that training-free methods rely on teaching-knowledge patterns at 2-3x the rate of RL-trained models, with a compensating ~10 percentage-point reduction in intent-level scaffolding. We also find a task-dependent reasoning mode effect consistent across training-free and RL-based paradigms. Our approach enables efficient development of pedagogically aligned LLM tutors with prompts alone and minimal compute.

Unggi Lee, Minchul Shin, Yeil Jeong, Sookbun Lee, Jeongsu Moon, Kyungtae Joo, Eunjoo Lee, Hoilym Kwon• 2026

Related benchmarks

TaskDatasetResultRank
Math TutoringBigMath In-Domain
Rsol56.3
21
Math TutoringOpenLearnLM OOD
CK6.97
21
Math TutoringMTBench MathTutorBench OOD
Score (Sc)7.89
13
Showing 3 of 3 rows

Other info

Follow for update