Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Adversarial NLI: A New Benchmark for Natural Language Understanding

About

We introduce a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure. We show that training models on this new dataset leads to state-of-the-art performance on a variety of popular NLI benchmarks, while posing a more difficult challenge with its new test set. Our analysis sheds light on the shortcomings of current state-of-the-art models, and shows that non-expert annotators are successful at finding their weaknesses. The data collection method can be applied in a never-ending learning scenario, becoming a moving target for NLU, rather than a static benchmark that will quickly saturate.

Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, Douwe Kiela• 2019

Related benchmarks

TaskDatasetResultRank
Natural Language InferenceSNLI (test)
Accuracy91
681
Natural Language UnderstandingGLUE (test)
SST-2 Accuracy94.9
416
Natural Language InferenceSciTail (test)
Accuracy94.4
86
Natural Language InferenceSNLI (dev)
Accuracy91.7
71
Factual Consistency EvaluationTRUE benchmark
PAWS (AUC-ROC)86.35
37
Natural Language InferenceANLI (test)
Overall Score55.1
28
Natural Language InferenceMNLI (val)
Accuracy90.01
26
Natural Language InferenceANLI (val)
Accuracy73.37
21
Natural Language InferenceWANLI (test)
Accuracy67.04
21
Natural Language InferenceGNLI Human (test)
Accuracy82.86
21
Showing 10 of 17 rows

Other info

Follow for update