SemBleu: A Robust Metric for AMR Parsing Evaluation

About

Evaluating AMR parsing accuracy involves comparing pairs of AMR graphs. The major evaluation metric, SMATCH (Cai and Knight, 2013), searches for one-to-one mappings between the nodes of two AMRs with a greedy hill-climbing algorithm, which leads to search errors. We propose SEMBLEU, a robust metric that extends BLEU (Papineni et al., 2002) to AMRs. It does not suffer from search errors and considers non-local correspondences in addition to local ones. SEMBLEU is fully content-driven and punishes situations where a system's output does not preserve most information from the input. Preliminary experiments on both sentence and corpus levels show that SEMBLEU has slightly higher consistency with human judgments than SMATCH. Our code is available at http://github.com/freesunshine0316/sembleu.

Linfeng Song, Daniel Gildea• 2019

Related benchmarks

Task	Dataset	Result
Semantic Similarity	STS-B (test)	--	18
AMR Similarity Consistency	BAMBOO (test)	Main - STS-B66.03	17
Semantic Textual Similarity	STS SemEval-2017 Task 1 (test)	--	8
Semantic Similarity	SICK-R (test)	Semantic Consistency (spring)60.15	5
Structural Consistency	RARE (test)	Structural Consistency94.83	5
Semantic Relatedness	SICK filtered 2014 (test)	RMSE0.38	3
AMR Parsing Evaluation	LDC2015E86 (test)	Comparison: CAMR vs JAMR69	2

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord