We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IR

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Information Retrieval

Title: UnifieR: A Unified Retriever for Large-Scale Retrieval

Abstract: Large-scale retrieval is to recall relevant documents from a huge collection given a query. It relies on representation learning to embed documents and queries into a common semantic encoding space. According to the encoding space, recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. These two paradigms unveil the PLMs' representation capability in different granularities, i.e., global sequence-level compression and local word-level contexts, respectively. Inspired by their complementary global-local contextualization and distinct representing views, we propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability. Experiments on passage retrieval benchmarks verify its effectiveness in both paradigms. A uni-retrieval scheme is further presented with even better retrieval quality. We lastly evaluate the model on BEIR benchmark to verify its transferability.
Comments: To appear at KDD ADS 2023
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Cite as: arXiv:2205.11194 [cs.IR]
  (or arXiv:2205.11194v2 [cs.IR] for this version)

Submission history

From: Tao Shen [view email]
[v1] Mon, 23 May 2022 11:01:59 GMT (418kb,D)
[v2] Sun, 4 Jun 2023 12:59:36 GMT (762kb,D)

Link back to: arXiv, form interface, contact.