Beyond Fine-Tuning: In-Context Learning and Chain-of-Thought for Reasoned Distractor Generation
About
Distractor generation (DG) remains a labor-intensive task that still significantly depends on domain experts. The task focuses on generating plausible yet incorrect options, known as distractors, for multiple-choice questions. A reliable distractor must be contextually relevant to the question and able to mislead examinees through implicit reasoning when identifying the correct answer. While a recent method integrates fine-tuning pre-trained encoder-decoder models with contrastive learning to generate semantically relevant distractors for a given question-answer, it often fails to capture the underlying reasoning process that experts utilize when selecting distractors in benchmarks. In this paper, we explore large language models (LLMs) reasoning for DG through in-context learning with unsupervised semantic retrieval for selecting few-shot examples. We design a rationale-augmented DG framework that jointly generates distractors and their rationales for a given question-answer. Extensive experiments on six benchmarks, with varying average distractor lengths and domains, demonstrate that prompting LLMs with few-shot examples substantially improves the performance compared to recent DG models. It outperforms recent approaches and achieves state-of-the-art results in generating reasoned distractors that align with human-labeled benchmarks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Question Answering | ARC Challenge | -- | 598 | |
| Question Answering | MedQA | -- | 86 | |
| Multiple-choice Question Answering | MCQA | -- | 25 | |
| Question Answering | SciQ | -- | 15 | |
| Question Answering | MCQL | -- | 14 | |
| Distractor Generation | MCQ | P@130.5 | 12 | |
| Distractor Generation | SciQ | P@125.5 | 12 | |
| Distractor Generation | MCQL | P@136.17 | 12 | |
| Distractor Generation | MedQA | P@121.05 | 12 | |
| Distractor Generation | ARC Easy | P@123.7 | 12 |