LLM-AUG: Robust Wireless Data Augmentation with In-Context Learning in Large Language Models

About

Data scarcity remains a fundamental bottleneck in applying deep learning to wireless communication problems, particularly in scenarios where collecting labeled Radio Frequency (RF) data is expensive, time-consuming, or operationally constrained. This paper proposes LLM-AUG, a data augmentation framework that leverages in-context learning in large language models (LLMs) to generate synthetic training samples directly in a learned embedding space. Unlike conventional generative approaches that require training task-specific models, LLM-AUG performs data generation through structured prompting, enabling rapid adaptation in low-shot regimes. We evaluate LLM-AUG on two representative tasks: modulation classification and interference classification using the RadioML 2016.10A dataset, and the Interference Classification (IC) dataset respectively. Results show that LLM-AUG consistently outperforms traditional augmentation and deep generative baselines across low-shot settings and reaches near oracle performance using only 15% labeled data. LLM-AUG further demonstrates improved robustness under distribution shifts, yielding a 29.4% relative gain over diffusion-based augmentation at a lower SNR value. On the RadioML and IC datasets, LLM-AUG yields a relative gain of 67.6% and 35.7% over the diffusion-based baseline. The t-SNE visualizations further validate that synthetic samples generated by better preserve class structure in the embedding space, leading to more consistent and informative augmentations. These results demonstrate that LLMs can serve as effective and practical data augmenters for wireless machine learning, enabling robust and data-efficient learning in evolving wireless environments.

Pranshav Gajjar, Manan Tiwari, Sayanta Seth, Vijay K. Shah• 2026

Related benchmarks

Task	Dataset	Result
Interference Classification	IC dataset 25 s/cls	F1 Score79.9	14
Interference Classification	IC dataset 50 s/cls	F1 Score84.8	14
Modulation Classification	RadioML 10 s/cls 2016.10A	F1 Score37	14
Modulation Classification	RadioML 25 s/cls 2016.10A	F1 Score48.3	14
Modulation Classification	RadioML 50 s/cls 2016.10A	F1 Score48.9	14
Interference Classification	IC dataset 10 s/cls	F1 Score52.7	14

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord