Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers

About

As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We introduce MAUVE, a comparison measure for open-ended text generation, which directly compares the learnt distribution from a text generation model to the distribution of human-written text using divergence frontiers. MAUVE scales up to modern text generation models by computing information divergences in a quantized embedding space. Through an extensive empirical study on three open-ended generation tasks, we find that MAUVE identifies known properties of generated text, scales naturally with model size, and correlates with human judgments, with fewer restrictions than existing distributional evaluation metrics.

Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, John Thickstun, Sean Welleck, Yejin Choi, Zaid Harchaoui• 2021

Related benchmarks

TaskDatasetResultRank
Sentiment ClassificationTwitter Financial News (test)
F1 Score0.546
23
Text GenerationWebText--
15
Question AnsweringTruthfulQA
Correlation Score96.81
10
Factuality VerificationCLIMATE-FEVER
Correctness Score70.95
10
Natural Language InferenceSNLI
Correlation Coefficient41.5
10
Image CaptioningCOCO Caption
Correlation Score6.87
10
SummarizationNEWTS
Correlation Score (Corr)1.64
10
Image Classificationunmet-promise (Split 2)
Accuracy56.3
9
Text2SQLBIRD App Store
Execution Accuracy38.4
9
Text2SQLBIRD Computer Students
Execution Accuracy48.3
9
Showing 10 of 28 rows

Other info

Code

Follow for update