Synthcity: facilitating innovative use cases of synthetic data in different data modalities
About
Synthcity is an open-source software package for innovative use cases of synthetic data in ML fairness, privacy and augmentation across diverse tabular data modalities, including static data, regular and irregular time series, data with censoring, multi-source data, composite data, and more. Synthcity provides the practitioners with a single access point to cutting edge research and tools in synthetic data. It also offers the community a playground for rapid experimentation and prototyping, a one-stop-shop for SOTA benchmarks, and an opportunity for extending research impact. The library can be accessed on GitHub (https://github.com/vanderschaarlab/synthcity) and pip (https://pypi.org/project/synthcity/). We warmly invite the community to join the development effort by providing feedback, reporting bugs, and contributing code.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Membership Inference Attack | Abalone | AUC-ROC60 | 10 | |
| Membership Inference Attack | CA Housing | AUC-ROC0.7 | 8 | |
| Membership Inference Attack | CASP | AUC-ROC0.72 | 8 | |
| Membership Inference Attack | Diabetes | AUC-ROC0.66 | 8 | |
| Membership Inference Attack | Faults | AUC-ROC0.61 | 8 | |
| Synthetic Data Generation (Tabular Classification Utility) | Average of 5 datasets | Synthetic Score79.7 | 5 | |
| Synthetic Data Privacy Evaluation | Multiple datasets Average | Discriminator AUC0.91 | 5 | |
| Synthetic Data Generation | Multiple benchmark datasets Average | Training Time (s)268.5 | 5 | |
| Tabular Data Synthesis Fidelity | 5 datasets Average performance | Overall Score0.832 | 5 |