Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Heterogeneous Risk Minimization

About

Machine learning algorithms with empirical risk minimization usually suffer from poor generalization performance due to the greedy exploitation of correlations among the training data, which are not stable under distributional shifts. Recently, some invariant learning methods for out-of-distribution (OOD) generalization have been proposed by leveraging multiple training environments to find invariant relationships. However, modern datasets are frequently assembled by merging data from multiple sources without explicit source labels. The resultant unobserved heterogeneity renders many invariant learning methods inapplicable. In this paper, we propose Heterogeneous Risk Minimization (HRM) framework to achieve joint learning of latent heterogeneity among the data and invariant relationship, which leads to stable prediction despite distributional shifts. We theoretically characterize the roles of the environment labels in invariant learning and justify our newly proposed HRM framework. Extensive experimental results validate the effectiveness of our HRM framework.

Jiashuo Liu, Zheyuan Hu, Peng Cui, Bo Li, Zheyan Shen• 2021

Related benchmarks

TaskDatasetResultRank
ClassificationTemporal heterogeneity synthetic datasets
Mean Accuracy50.01
30
House price predictionKaggle House Price Mean (test)
MSE0.4221
8
House price predictionKaggle House Price Worst (test)
MSE0.5721
8
House price predictionKaggle House Price (train)
MSE0.319
8
Synthetic Data ClassificationSpatial Heterogeneity Synthetic (ps(r)=(0.999, 0.999, 0.7, 0.7), pv=0.9) (test)
Mean Accuracy49.98
5
Synthetic Data ClassificationSpatial Heterogeneity (ps(r)=(0.999, 0.9, 0.8, 0.7), pv=0.9) synthetic (test)
Mean Accuracy49.99
5
Synthetic Data ClassificationSpatial Heterogeneity ps(r)=(0.999, 0.999, 0.7, 0.7), pv=0.8 Synthetic (test)
Mean Accuracy (Test)49.97
5
Synthetic Data ClassificationSpatial Heterogeneity (ps(r)=(0.999, 0.9, 0.8, 0.7), pv=0.8) synthetic (test)
Test Mean Accuracy50
5
Synthetic Data ClassificationSpatial Heterogeneity (ps(r)=(0.999, 0.999, 0.8, 0.8), pv=0.9) Synthetic (test)
Mean Accuracy (Test)49.97
5
Synthetic Data ClassificationSpatial Heterogeneity (ps(r)=(0.999, 0.999, 0.8, 0.8), pv=0.8) synthetic (test)
Test Mean Acc49.99
5
Showing 10 of 10 rows

Other info

Follow for update