Challenges in Data-to-Document Generation

About

Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.

Sam Wiseman, Stuart M. Shieber, Alexander M. Rush• 2017

Related benchmarks

Task	Dataset	Result
Summarization	arXiv	ROUGE-27.42	76
Summarization	Pubmed	ROUGE-133.89	70
Data-to-text generation	MLB (test)	RG Precision99.9	22
Data-to-text generation	RotoWire (test)	Factual Support Score7.57	19
Data-to-text generation	ROTOWIRE (dev)	RG Score0.5429	12
Data-to-text generation	ROTOWIRE English (test)	RG Score54.3	12
Knowledge Selection	RotoWire-FG	Relation Generation P93.72	10
Data-to-text generation	German ROTOWIRE (DE-RW) (test)	RG Score54.4	8
Data-to-text generation	MLB (dev)	RG Score59.93	4

Showing 9 of 9 rows

Other info

Code

Follow for update

@wizwand_team Discord