NanoFlux: Adversarial Dual-LLM Evaluation and Distillation For Multi-Domain Reasoning

About

We present NanoFlux, a novel adversarial framework for generating targeted training data to improve LLM reasoning, where adversarially-generated datasets containing fewer than 200 examples outperform conventional fine-tuning approaches. The framework employs a competitive dynamic between models alternating as Attacker and Defender, supervised by a tool-augmented Judge, synthesizing multi-step questions with explanatory annotations that target specific reasoning capabilities. Fine-tuning a 4B-parameter model on NanoFlux-generated data yields performance gains across diverse domains compared to full-benchmark fine-tuning: +5.9% on mathematical reasoning (GSMHard), +3.6% on scientific reasoning (GenomeBench), and +16.6% on medical reasoning (MultiMedQA), while reducing computational requirements by 3-14x. Ablation studies reveal a non-monotonic relationship between dataset characteristics and model performance, uncovering domain-specific optimal points for question complexity and reasoning quality. NanoFlux automates training data generation through embedding-based novelty filtering, tool-augmented evaluation, and multi-hop reasoning, suggesting that future model improvements may lie in the intelligent synthesis of small, precisely targeted training datasets.

Raviteja Anantha, Soheil Hor, Teodor Nicola Antoniu, Layne C. Price• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM-Hard	Accuracy63.3	46
Medical Reasoning	MultiMedQA	Accuracy61.2	4
Scientific Reasoning	GenomeBench	Accuracy61.3	4

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord