Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

NUBIA: NeUral Based Interchangeability Assessor for Text Generation

About

We present NUBIA, a methodology to build automatic evaluation metrics for text generation using only machine learning models as core components. A typical NUBIA model is composed of three modules: a neural feature extractor, an aggregator and a calibrator. We demonstrate an implementation of NUBIA which outperforms metrics currently used to evaluate machine translation, summaries and slightly exceeds/matches state of the art metrics on correlation with human judgement on the WMT segment-level Direct Assessment task, sentence-level ranking and image captioning evaluation. The model implemented is modular, explainable and set to continuously improve over time.

Hassan Kane, Muhammed Yusuf Kocyigit, Ali Abdalla, Pelkins Ajanoh, Mohamed Coulibali• 2020

Related benchmarks

TaskDatasetResultRank
Image Captioning EvaluationFlickr8K Expert (test)
Kendall tau_c49.5
76
Correlation with human judgmentFlickr8K Expert 2013 (full)
Kendall's Tau49.5
14
Showing 2 of 2 rows

Other info

Follow for update