Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Segment-driven Structural Induction and Semantic Alignment for Heterogeneous Tabular Representation

About

Real-world domains often contain heterogeneous tables whose headers vary while their underlying attribute semantics are shared, making it difficult to induce domain-specialized semantics from table-local evidence alone. Existing encoders model parts of this problem, but often underuse column-level value distributions and apply uniform objectives across attributes with different semantic roles. We propose NAVI, a segment-centric pretraining framework that treats each header-value pair as the unit for aggregating schema-level structural evidence and column-level distributional evidence. We realize this design through Masked Segment Modeling and Entropy-driven Segment Alignment, which jointly enforce structured header-value coupling and semantic alignment across stable and instance-specific attributes. Experiments on heterogeneous in-domain tables show improved reconstruction, semantic consistency, and downstream utility across evaluation settings overall.

Woojun Jung, Susik Yoon• 2026

Related benchmarks

TaskDatasetResultRank
Header PredictionProduct (test)
Accuracy99.97
16
Header PredictionMovie (test)
Accuracy99.98
16
Row ClassificationProduct (test)
Macro-F1 (XGBoost)94.4
11
Row ClassificationMovie (test)
Macro F1 (XGBoost)62.9
11
Header PredictionProduct
Accuracy99.95
7
Header PredictionMovie
Accuracy99.98
7
Value ImputationProduct
Accuracy79.77
7
Value ImputationMovie
Accuracy70.77
7
Header ClusteringProduct domain
NMI90.05
4
Header ClusteringMovie domain
NMI91.44
4
Showing 10 of 10 rows

Other info

Follow for update