Training with Pseudo-Code for Instruction Following

About

Despite rapid advances in the capabilities of Large Language Models (LLMs), they continue to struggle with following relatively simple and unambiguous instructions, particularly when compositional structure is involved. Recent work suggests that models may follow instructions more effectively when they are expressed in pseudo-code rather than natural language. However, writing pseudo-code programs can be tedious, and relying on few-shot demonstrations or inference-time code prompting is often unnatural for non-expert users of LLMs. To overcome these limitations, we propose a training time approach that fine-tunes LLMs using instruction-tuning data augmented with pseudo-code representations of natural language instructions paired with final responses. We evaluate our method on 12 publicly available benchmarks spanning instruction-following, mathematical reasoning, and commonsense reasoning, across six base models. Our results show that models trained with pseudo-code follow instructions more reliably, achieving relative gains of 8-21\% on instruction following benchmarks, while largely preserving and in some cases improving performance on mathematical and commonsense reasoning tasks, with an average gain of up to 30\% across all evaluated benchmarks.

Prince Kumar, Rudra Murthy, Riyaz Bhat, Danish Contractor• 2025

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	Commonsense Reasoning Suite (test)	HellaSwag Accuracy0.71	62
General LLM Evaluation	Instruction-Following, Mathematics, and Commonsense Reasoning Combined	Average Score57	18
Mathematics	Mathematics Suite	GSM8K Accuracy73	18
Instruction Following	Instruction-Following Suite	IFEval Score48	18

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord