Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ZIN: When and How to Learn Invariance Without Environment Partition?

About

It is commonplace to encounter heterogeneous data, of which some aspects of the data distribution may vary but the underlying causal mechanisms remain constant. When data are divided into distinct environments according to the heterogeneity, recent invariant learning methods have proposed to learn robust and invariant models based on this environment partition. It is hence tempting to utilize the inherent heterogeneity even when environment partition is not provided. Unfortunately, in this work, we show that learning invariant features under this circumstance is fundamentally impossible without further inductive biases or additional information. Then, we propose a framework to jointly learn environment partition and invariant representation, assisted by additional auxiliary information. We derive sufficient and necessary conditions for our framework to provably identify invariant features under a fairly general setting. Experimental results on both synthetic and real world datasets validate our analysis and demonstrate an improved performance of the proposed framework over existing methods. Finally, our results also raise the need of making the role of inductive biases more explicit in future works, when considering learning invariant models without environment partition. Codes are available at https://github.com/linyongver/ZIN_official .

Yong Lin, Shengyu Zhu, Lu Tan, Peng Cui• 2022

Related benchmarks

TaskDatasetResultRank
ClassificationTemporal heterogeneity synthetic datasets
Mean Accuracy87.5
30
Smiling ClassificationCelebA (test)
Acc76.29
18
House price predictionKaggle House Price Mean (test)
MSE0.3339
8
House price predictionKaggle House Price Worst (test)
MSE0.4815
8
Smiling ClassificationCelebA (train)
Accuracy90.62
8
House price predictionKaggle House Price (train)
MSE0.2275
8
Land Cover ClassificationLandcover OOD (test)
Accuracy66.06
6
Land Cover ClassificationLandcover IID (test)
Accuracy72.56
6
Synthetic Data ClassificationSpatial Heterogeneity Synthetic (ps(r)=(0.999, 0.999, 0.7, 0.7), pv=0.9) (test)
Mean Accuracy88.66
5
Synthetic Data ClassificationSpatial Heterogeneity ps(r)=(0.999, 0.999, 0.7, 0.7), pv=0.8 Synthetic (test)
Mean Accuracy (Test)79.16
5
Showing 10 of 14 rows

Other info

Code

Follow for update