MENLI: Robust Evaluation Metrics from Natural Language Inference

Chen, Yanran; Eger, Steffen

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2208

Computer Science > Computation and Language

Title: MENLI: Robust Evaluation Metrics from Natural Language Inference

Authors: Yanran Chen, Steffen Eger

(Submitted on 15 Aug 2022 (this version), latest version 26 Dec 2023 (v5))

Abstract: Recently proposed BERT-based evaluation metrics perform well on standard evaluation benchmarks but are vulnerable to adversarial attacks, e.g., relating to factuality errors. We argue that this stems (in part) from the fact that they are models of semantic similarity. In contrast, we develop evaluation metrics based on Natural Language Inference (NLI), which we deem a more appropriate modeling. We design a preference-based adversarial attack framework and show that our NLI based metrics are much more robust to the attacks than the recent BERT-based metrics. On standard benchmarks, our NLI based metrics outperform existing summarization metrics, but perform below SOTA MT metrics. However, when we combine existing metrics with our NLI metrics, we obtain both higher adversarial robustness (+20% to +30%) and higher quality metrics as measured on standard benchmarks (+5% to +25%).

Subjects:	Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2208.07316 [cs.CL]
	(or arXiv:2208.07316v1 [cs.CL] for this version)

Submission history

From: Yanran Chen [view email]
[v1] Mon, 15 Aug 2022 16:30:14 GMT (5082kb,D)
[v2] Mon, 3 Apr 2023 16:15:04 GMT (1367kb,D)
[v3] Tue, 4 Apr 2023 10:23:20 GMT (1368kb,D)
[v4] Tue, 11 Apr 2023 15:10:05 GMT (1375kb,D)
[v5] Tue, 26 Dec 2023 04:11:04 GMT (1375kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2208.07316v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: MENLI: Robust Evaluation Metrics from Natural Language Inference

Submission history