LegalDrill: Diagnosis-Driven Synthesis for Legal Reasoning in Small Language Models
About
Small language models (SLMs) are promising for real-world deployment due to their efficiency and low operational cost. However, their limited capacity struggles with high-stakes legal reasoning tasks that require coherent statute interpretation and logically consistent deduction. Furthermore, training SLMs for such tasks demands high-quality, concise reasoning trajectories, which are prohibitively expensive to manually collect and difficult to curate via standard rejection sampling, lacking granularity beyond final verdicts. To address these challenges, we propose {LegalDrill}, a diagnosis-driven synthesis framework that extracts and iteratively refines reasoning trajectories from a capable teacher via fine-grained prompting, then a self-reflective verification is employed to adaptively select the most effective data for the SLM student. The resulting data empower SLM training through supervised fine-tuning and direct preference optimization. Extensive experiments on several legal benchmarks demonstrate that {LegalDrill} significantly bolsters the legal reasoning capabilities of representative SLMs while bypassing the need for scarce expert annotations, paving a scalable path toward practical legal reasoning systems.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Entailment | Privacy Policy Entailment (Priv. Ent.) | Accuracy85 | 11 | |
| Question Answering | Contracts QA | Accuracy97 | 11 | |
| Entailment | Sara Entailment | Accuracy75 | 11 | |
| Question Answering | Consumer QA (Cos. QA) | Accuracy96 | 11 | |
| Legal Reasoning | Real-World POA | Accuracy92 | 5 | |
| Legal Reasoning | Real-World Trust | Accuracy90 | 5 |