We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: ConTextual Masked Auto-Encoder for Dense Passage Retrieval

Abstract: Dense passage retrieval aims to retrieve the relevant passages of a query from a large corpus based on dense representations (i.e., vectors) of the query and the passages. Recent studies have explored improving pre-trained language models to boost dense retrieval performance. This paper proposes CoT-MAE (ConTextual Masked Auto-Encoder), a simple yet effective generative pre-training method for dense passage retrieval. CoT-MAE employs an asymmetric encoder-decoder architecture that learns to compress the sentence semantics into a dense vector through self-supervised and context-supervised masked auto-encoding. Precisely, self-supervised masked auto-encoding learns to model the semantics of the tokens inside a text span, and context-supervised masked auto-encoding learns to model the semantical correlation between the text spans. We conduct experiments on large-scale passage retrieval benchmarks and show considerable improvements over strong baselines, demonstrating the high efficiency of CoT-MAE. Our code is available at this https URL
Comments: This paper has been accepted by AAAI2023
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as: arXiv:2208.07670 [cs.CL]
  (or arXiv:2208.07670v3 [cs.CL] for this version)

Submission history

From: Wu Xing [view email]
[v1] Tue, 16 Aug 2022 11:17:22 GMT (625kb,D)
[v2] Wed, 7 Sep 2022 08:59:20 GMT (626kb,D)
[v3] Thu, 1 Dec 2022 23:01:41 GMT (625kb,D)

Link back to: arXiv, form interface, contact.