Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Generating Logically Consistent Synthetic Supply Chain Data with LLM-Driven Knowledge Graph Reasoning

About

Synthetic data offers a promising solution to two persistent barriers in supply chain analytics: data scarcity and data privacy. However, for synthetic data to support operational simulation and decision-making, it must do more than reproduce the statistical distributions of real records, and also preserve the \emph{operational logic} that governs supply chain processes, including the temporal orderings, mathematical dependencies, hierarchical taxonomies, and conditional rules that make a record operationally plausible. We consider this logic as the ``physics'' of supply chain data. Existing tabular generative models are primarily optimized for distributional fidelity and downstream predictive utility, and therefore often generate records that appear statistically realistic but violate fundamental operational constraints. This paper introduces \textbf{\textit{TabKG}}, a knowledge-graph-guided framework for logically consistent synthetic supply chain tabular data generation. TabKG constructs a \textbf{\textit{Column Relationship Knowledge Graph (CR-KG)}} to represent data operational dependencies. It uses a multi-LLM ensemble with majority voting to propose candidate relationships from column metadata, validates these relationships against real data to remove hallucinated or unsupported edges, and then uses the validated CR-KG to guide generation. Specifically, TabKG compresses the original table into independent columns, generates these columns using a latent diffusion model, and deterministically reconstructs dependent columns according to the validated relationships, enforcing logical consistency by construction with respect to the discovered operational rules.

Yunbo Long, Ge Zheng, Liming Xu, Alexandra Brintrup• 2026

Related benchmarks

TaskDatasetResultRank
ClassificationPurchasing
AUC63.45
13
Synthetic Tabular Data GenerationRetailing
Density Estimation96.46
11
Inter-column relationship preservationRetailing
HCS97.84
11
Inter-column relationship preservationPurchasing
HCS98.41
11
Synthetic Tabular Data GenerationPurchasing
Density Estimation98.14
11
Privacy PreservationRetailing
DCR90.03
11
Privacy PreservationPurchasing
DCR90.43
11
late-delivery risk predictionRetail
AUC71.9
6
Showing 8 of 8 rows

Other info

Follow for update