Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DISCO-TAB: A Hierarchical Reinforcement Learning Framework for Privacy-Preserving Synthesis of Complex Clinical Data

About

The development of robust clinical decision support systems is frequently impeded by the scarcity of high-fidelity, privacy-preserving biomedical data. While Generative Large Language Models (LLMs) offer a promising avenue for synthetic data generation, they often struggle to capture the complex, non-linear dependencies and severe class imbalances inherent in Electronic Health Records (EHR), leading to statistically plausible but clinically invalid records. To bridge this gap, we introduce DISCO-TAB (DIScriminator-guided COntrol for TABular synthesis), a novel framework that orchestrates a fine-tuned LLM with a multi-objective discriminator system optimized via Reinforcement Learning. Unlike prior methods relying on scalar feedback, DISCO-TAB evaluates synthesis at four granularities, token, sentence, feature, and row, while integrating Automated Constraint Discovery and Inverse-Frequency Reward Shaping to autonomously preserve latent medical logic and resolve minority-class collapse. We rigorously validate our framework across diverse benchmarks, including high-dimensional, small-sample medical datasets (e.g., Heart Failure, Parkinson's). Our results demonstrate that hierarchical feedback yields state-of-the-art performance, achieving up to 38.2% improvement in downstream clinical classifier utility compared to GAN and Diffusion baselines, while ensuring exceptional statistical fidelity (JSD < 0.01) and robust resistance to membership inference attacks. This work establishes a new standard for generating trustworthy, utility-preserving synthetic tabular data for sensitive healthcare applications.

Arshia Ilaty, Hossein Shirazi, Amir Rahmani, Hajar Homayouni• 2026

Related benchmarks

TaskDatasetResultRank
ClassificationGerman Credit
F1 Score92.5
15
Machine Learningbank-marketing
F1 Score87.1
15
Downstream ML UtilityHeart Failure
F1-score100
8
Downstream ML UtilityBreast cancer
F1-score99.4
8
Downstream ML Utilityliver-disorders
F1-score99
8
Downstream ML UtilityParkinsons
F1-score96.6
8
Downstream ML UtilityObesity
F1-score92.9
8
Tabular Synthetic Data GenerationHeart Failure
KS Statistic0.022
8
Tabular Synthetic Data GenerationBreast cancer
KS Statistic0.025
8
Tabular Synthetic Data Generationliver-disorders
KS Statistic0.017
8
Showing 10 of 33 rows

Other info

Follow for update