JANUS: Structured Bidirectional Generation for Guaranteed Constraints and Analytical Uncertainty
About
High-stakes synthetic data generation faces a fundamental Quadrilemma: achieving Fidelity to the original distribution, Control over complex logical constraints, Reliability in uncertainty estimation, and Efficiency in computational cost -- simultaneously. State-of-the-art Deep Generative Models (CTGAN, TabDDPM) excel at fidelity but rely on inefficient rejection sampling for continuous range constraints. Conversely, Structural Causal Models offer logical control but struggle with high-dimensional fidelity and complex noise inversion. We introduce JANUS (Joint Ancestral Network for Uncertainty and Synthesis), a framework that unifies these capabilities using a DAG of Bayesian Decision Trees. Our key innovation is Reverse-Topological Back-filling, an algorithm that propagates constraints backwards through the causal graph, achieving 100% constraint satisfaction on feasible constraint sets without rejection sampling. This is paired with an Analytical Uncertainty Decomposition derived from Dirichlet priors, enabling 128x faster uncertainty estimation than Monte Carlo methods. Across 15 datasets and 523 constrained scenarios, JANUS achieves state-of-the-art fidelity (Detection Score 0.497), eliminates mode collapse on imbalanced data, and provides exact handling of complex inter-column constraints (e.g., Salary_offered >= Salary_requested) where baselines fail entirely.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Constrained Generation | Experimental Grid Oracle SEM (All experiments) | Score93.9 | 7 | |
| Unconditional Tabular Data Generation | Synthcity (15 datasets average: Adult, Credit, Bank, Wine Quality, Car Evaluation, SPECTF Heart, Communities Crime, Law Students, Student Performance, Circle, Multivariate Normal, Imbalanced, Mixed, Iris, Complex Stress) | MMD0.012 | 7 | |
| Unconditional Generation | Synthcity Average of 15 datasets: Adult, Credit, Bank, Wine Quality, Car Evaluation, SPECTF Heart, Communities Crime, Law Students, Student Performance, Circle, Multivariate Normal, Imbalanced, Mixed, Iris, Complex Stress (test) | Feature Correlation2.42 | 7 | |
| Counterfactual reasoning | Chain NADD | MSE48.8 | 5 | |
| Counterfactual reasoning | Triangle NADD | MSE150 | 5 | |
| Counterfactual reasoning | Diamond NADD | MSE599 | 5 | |
| Counterfactual reasoning | Y-struct NADD | MSE1.65e+4 | 5 | |
| Noise Detection | Synthetic setup 50% injected label noise | Detection Ratio1.17 | 4 | |
| Synthetic Data Generation | Adult, Bank Marketing, and Credit Default (aggregated) | Avg F159.1 | 3 |