Diverse Demonstrations Improve In-context Compositional Generalization
About
In-context learning has shown great success in i.i.d semantic parsing splits, where the training and test sets are drawn from the same distribution. In this setup, models are typically prompted with demonstrations that are similar to the input utterance. However, in the setup of compositional generalization, where models are tested on outputs with structures that are absent from the training set, selecting similar demonstrations is insufficient, as often no example will be similar enough to the input. In this work, we propose a method to select diverse demonstrations that aims to collectively cover all of the structures required in the output program, in order to encourage the model to generalize to new structures from these demonstrations. We empirically show that combining diverse demonstrations with in-context learning substantially improves performance across three compositional generalization semantic parsing datasets in the pure in-context learning setup and when combined with finetuning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Intent Classification | Banking77 (test) | Accuracy83.5 | 151 | |
| Semantic Parsing | GeoQuery (i.i.d.) | Exact Match Accuracy91.4 | 32 | |
| Semantic Parsing | GeoQuery compositional | Accuracy81.6 | 29 | |
| Semantic Parsing | SMCalFlow-CS (16-C) | Accuracy73.5 | 20 | |
| Semantic Parsing | GeoQuery (Len.) | Exact Match Accuracy74.3 | 17 | |
| Semantic Parsing | GeoQuery (TMCD) | Exact Match Acc82.8 | 12 | |
| Semantic Parsing | SMCalFlow CS (8-C) | Exact Match Accuracy77.3 | 8 | |
| Semantic Parsing | SMCalFlow-CS (0-C) | Exact Match Acc40.7 | 8 | |
| Semantic Parsing | COVR-10 (test) | Exact Match Accuracy83.2 | 8 | |
| Semantic Parsing | GeoQuery (Templ.) | Exact Match Accuracy85.3 | 8 |