Learning explanations that are hard to vary

About

In this paper, we investigate the principle that `good explanations are hard to vary' in the context of deep learning. We show that averaging gradients across examples -- akin to a logical OR of patterns -- can favor memorization and `patchwork' solutions that sew together different strategies, instead of identifying invariances. To inspect this, we first formalize a notion of consistency for minima of the loss surface, which measures to what extent a minimum appears only when examples are pooled. We then propose and experimentally validate a simple alternative algorithm based on a logical AND, that focuses on invariances and prevents memorization in a set of real-world tasks. Finally, using a synthetic dataset with a clear distinction between invariant and spurious mechanisms, we dissect learning signals and compare this approach to well-established regularizers.

Giambattista Parascandolo, Alexander Neitz, Antonio Orvieto, Luigi Gresele, Bernhard Sch\"olkopf• 2020

Related benchmarks

Task	Dataset	Result
Domain Generalization	VLCS	Accuracy78.1	270
Domain Generalization	PACS	Accuracy84.4	263
Domain Generalization	OfficeHome	Accuracy65.6	234
Domain Generalization	DomainNet	Accuracy37.2	153
Domain Generalization	TerraIncognita	Accuracy44.6	121
Cross-user Activity Recognition	DSADS (cross-user)	Accuracy (ABC->D)82.36	7
Cross-user Activity Recognition	PAMAP2	Acc (AB->C)58.75	7

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord