Semiparametric Token-Sequence Co-Supervision

About

In this work, we introduce a semiparametric token-sequence co-supervision training method. It trains a language model by simultaneously leveraging supervision from the traditional next token prediction loss which is calculated over the parametric token embedding space and the next sequence prediction loss which is calculated over the nonparametric sequence embedding space. The nonparametric sequence embedding space is constructed by a separate language model tasked to condense an input text into a single representative embedding. Our experiments demonstrate that a model trained via both supervisions consistently surpasses models trained via each supervision independently. Analysis suggests that this co-supervision encourages a broader generalization capability across the model. Especially, the robustness of parametric token space which is established during the pretraining step tends to effectively enhance the stability of nonparametric sequence embedding space, a new space established by another language model.

Hyunji Lee, Doyoung Kim, Jihoon Jun, Sejune Joo, Joel Jang, Kyoung-Woon On, Minjoon Seo• 2024

Related benchmarks

Task	Dataset	Result
Fact Verification	KILT FEVER (test)	Retrieval77.5	4
Knowledge Grounded Dialogue	KILT WoW (test)	Retrieval49.8	4
Knowledge-grounded Generation	ASQA ALCE (test)	Correctness31.8	4
Knowledge-grounded Generation	ELI5 ALCE (test)	Correctness10.5	4
Long-form Question Answering	KILT ELI5 (test)	Retrieval Score36.3	4
Multi-hop Question Answering	KILT HotpotQA (test)	Retrieval55.6	4
Open-domain Question Answering	KILT NQ* (test)	Retrieval Rate65.1	4
Open-domain Question Answering	KILT TriviaQA (test)	Retrieval74.5	4
Slot Filling	KILT ZSRE (test)	Retrieval80.5	4
Slot Filling	KILT T-REX (test)	Retrieval Score75.5	4

Showing 10 of 10 rows

Other info

Code

Follow for update

@wizwand_team Discord