GPT Understands, Too

About

Prompting a pretrained language model with natural language patterns has been proved effective for natural language understanding (NLU). However, our preliminary study reveals that manual discrete prompts often lead to unstable performance -- e.g., changing a single word in the prompt might result in substantial performance drop. We propose a novel method P-Tuning that employs trainable continuous prompt embeddings in concatenation with discrete prompts. Empirically, P-Tuning not only stabilizes training by minimizing the gap between various discrete prompts, but also improves performance by a sizeable margin on a wide range of NLU tasks including LAMA and SuperGLUE. P-Tuning is generally effective for both frozen and tuned language models, under both the fully-supervised and few-shot settings.

Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, Jie Tang• 2021

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	Common Sense Reasoning Tasks	Avg Score18.11	321
Mathematical Reasoning	GSM8K	Accuracy2.65	57
Language Understanding	MMLU	Average Accuracy24.85	50
Dialogue Generation	ConvAI2	BLEU1.5	48
Law reasoning	Law	Accuracy25.18	27
Binary Classification	GLUE (test)	QNLI Accuracy58.8	25
Knowledge recall	ARC-C (test)	Accuracy (ARC-c test)43.05	13
Multi-task Evaluation	Average GSM8K, HumanEval, ARC-c	Accuracy34.89	13
Sentiment Classification	SST-5 32 samples	Accuracy40.9	11
Sentiment Classification	SST-2 32 samples	Accuracy87.6	11

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord