References & Citations
Computer Science > Computation and Language
Title: BLiMP: The Benchmark of Linguistic Minimal Pairs for English
(Submitted on 2 Dec 2019 (v1), revised 23 Sep 2020 (this version, v3), latest version 14 Feb 2023 (v4))
Abstract: We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. The data is automatically generated according to expert-crafted grammars, and aggregate human agreement with the labels is 96.4%. We use it to evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs. We find that state-of-the-art models identify morphological contrasts reliably, but they struggle with semantic restrictions on the distribution of quantifiers and negative polarity items and subtle syntactic phenomena such as extraction islands.
Submission history
From: Alex Warstadt [view email][v1] Mon, 2 Dec 2019 05:42:41 GMT (288kb,D)
[v2] Thu, 16 Apr 2020 02:07:03 GMT (642kb,D)
[v3] Wed, 23 Sep 2020 20:08:54 GMT (642kb,D)
[v4] Tue, 14 Feb 2023 10:33:15 GMT (2189kb,D)
Link back to: arXiv, form interface, contact.