Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Few-Shot Self-Rationalization with Natural Language Prompts

About

Self-rationalization models that predict task labels and generate free-text elaborations for their predictions could enable more intuitive interaction with NLP systems. These models are, however, currently trained with a large amount of human-written free-text explanations for each task which hinders their broader usage. We propose to study a more realistic setting of self-rationalization using few training examples. We present FEB -- a standardized collection of four existing English-language datasets and associated metrics. We identify the right prompting approach by extensively exploring natural language prompts on FEB. Then, by using this prompt and scaling the model size, we demonstrate that making progress on few-shot self-rationalization is possible. We show there is still ample room for improvement in this task: the average plausibility of generated explanations assessed by human annotators is at most 51% (with GPT-3), while plausibility of human explanations is 76%. We hope that FEB and our proposed approach will spur the community to take on the few-shot self-rationalization challenge.

Ana Marasovi\'c, Iz Beltagy, Doug Downey, Matthew E. Peters• 2021

Related benchmarks

TaskDatasetResultRank
Natural Language Explanation GenerationECQA
Human Evaluation Score41.92
7
Natural Language Explanation GenerationE-SNLI
Human Evaluation Score29.63
7
Natural Language Explanation GenerationComVE
Human Evaluation Score40
7
Natural Language Explanation GenerationSBIC
Human Evaluation Score54.44
7
Natural Language Explanation GenerationSBIC 60-shot
Accuracy63.86
3
Natural Language Explanation GenerationComVE few-shot 60-shot
Accuracy63.71
3
Natural Language Explanation GenerationECQA few-shot 60-shot
Accuracy11.14
3
Natural Language Explanation Generatione-SNLI 60-shot
Accuracy34.91
3
Showing 8 of 8 rows

Other info

Follow for update