DoCoGen: Domain Counterfactual Generation for Low Resource Domain Adaptation
About
Natural language processing (NLP) algorithms have become very successful, but they still struggle when applied to out-of-distribution examples. In this paper we propose a controllable generation approach in order to deal with this domain adaptation (DA) challenge. Given an input text example, our DoCoGen algorithm generates a domain-counterfactual textual example (D-con) - that is similar to the original in all aspects, including the task label, but its domain is changed to a desired one. Importantly, DoCoGen is trained using only unlabeled examples from multiple domains - no NLP task labels or parallel pairs of textual examples and their domain-counterfactuals are required. We show that DoCoGen can generate coherent counterfactuals consisting of multiple sentences. We use the D-cons generated by DoCoGen to augment a sentiment classifier and a multi-label intent classifier in 20 and 78 DA setups, respectively, where source-domain labeled data is scarce. Our model outperforms strong baselines and improves the accuracy of a state-of-the-art unsupervised DA algorithm.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Sentiment Classification | Multi-Domain Sentiment Dataset 2008 (test) | Accuracy (A->D)0.838 | 12 | |
| Sentiment Classification | Blitzer 2006 (test) | A to B Accuracy84.4 | 9 | |
| Intent Prediction | MANTIS | AP77.1 | 4 | |
| Human intrinsic evaluation of domain counterfactual generation | Product Review Multi-domain dataset Domains A, D, E, K subset of 20 reviews | Domain Relevance (D)85 | 3 |