Synthcity: facilitating innovative use cases of synthetic data in different data modalities

About

Synthcity is an open-source software package for innovative use cases of synthetic data in ML fairness, privacy and augmentation across diverse tabular data modalities, including static data, regular and irregular time series, data with censoring, multi-source data, composite data, and more. Synthcity provides the practitioners with a single access point to cutting edge research and tools in synthetic data. It also offers the community a playground for rapid experimentation and prototyping, a one-stop-shop for SOTA benchmarks, and an opportunity for extending research impact. The library can be accessed on GitHub (https://github.com/vanderschaarlab/synthcity) and pip (https://pypi.org/project/synthcity/). We warmly invite the community to join the development effort by providing feedback, reporting bugs, and contributing code.

Zhaozhi Qian, Bogdan-Constantin Cebere, Mihaela van der Schaar• 2023

Related benchmarks

Task	Dataset	Result
Membership Inference Attack	Abalone	AUC-ROC60	10
Membership Inference Attack	CA Housing	AUC-ROC0.7	8
Membership Inference Attack	CASP	AUC-ROC0.72	8
Membership Inference Attack	Diabetes	AUC-ROC0.66	8
Membership Inference Attack	Faults	AUC-ROC0.61	8
Synthetic Data Generation (Tabular Classification Utility)	Average of 5 datasets	Synthetic Score79.7	5
Synthetic Data Privacy Evaluation	Multiple datasets Average	Discriminator AUC0.91	5
Synthetic Data Generation	Multiple benchmark datasets Average	Training Time (s)268.5	5
Tabular Data Synthesis Fidelity	5 datasets Average performance	Overall Score0.832	5

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord