Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NanoFlux: Adversarial Dual-LLM Evaluation and Distillation For Multi-Domain Reasoning

About

We present NanoFlux, a novel adversarial framework for generating targeted training data to improve LLM reasoning, where adversarially-generated datasets containing fewer than 200 examples outperform conventional fine-tuning approaches. The framework employs a competitive dynamic between models alternating as Attacker and Defender, supervised by a tool-augmented Judge, synthesizing multi-step questions with explanatory annotations that target specific reasoning capabilities. Fine-tuning a 4B-parameter model on NanoFlux-generated data yields performance gains across diverse domains compared to full-benchmark fine-tuning: +5.9% on mathematical reasoning (GSMHard), +3.6% on scientific reasoning (GenomeBench), and +16.6% on medical reasoning (MultiMedQA), while reducing computational requirements by 3-14x. Ablation studies reveal a non-monotonic relationship between dataset characteristics and model performance, uncovering domain-specific optimal points for question complexity and reasoning quality. NanoFlux automates training data generation through embedding-based novelty filtering, tool-augmented evaluation, and multi-hop reasoning, suggesting that future model improvements may lie in the intelligent synthesis of small, precisely targeted training datasets.

Raviteja Anantha, Soheil Hor, Teodor Nicola Antoniu, Layne C. Price• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM-Hard
Accuracy63.3
46
Medical ReasoningMultiMedQA
Accuracy61.2
4
Scientific ReasoningGenomeBench
Accuracy61.3
4
Showing 3 of 3 rows

Other info

Follow for update