We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IR

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Information Retrieval

Title: The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

Abstract: On a wide range of natural language processing and information retrieval tasks, transformer-based models, particularly pre-trained language models like BERT, have demonstrated tremendous effectiveness. Due to the quadratic complexity of the self-attention mechanism, however, such models have difficulties processing long documents. Recent works dealing with this issue include truncating long documents, segmenting them into passages that can be treated by a standard BERT model, or modifying the self-attention mechanism to make it sparser as in sparse-attention models. However, these approaches either lose information or have high computational complexity (and are both time, memory and energy consuming in this later case). We follow here a slightly different approach in which one first selects key blocks of a long document by local query-block pre-ranking, and then few blocks are aggregated to form a short document that can be processed by a model such as BERT. Experiments conducted on standard Information Retrieval datasets demonstrate the effectiveness of the proposed approach.
Subjects: Information Retrieval (cs.IR)
Cite as: arXiv:2111.09852 [cs.IR]
  (or arXiv:2111.09852v1 [cs.IR] for this version)

Submission history

From: Minghan Li [view email]
[v1] Thu, 18 Nov 2021 18:25:24 GMT (2062kb,D)
[v2] Tue, 21 Dec 2021 18:16:12 GMT (2058kb,D)

Link back to: arXiv, form interface, contact.