InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

Boytsov, Leonid; Patel, Preksha; Sourabh, Vivek; Nisar, Riddhi; Kundu, Sayani; Ramanathan, Ramya; Nyberg, Eric

Full-text links:

Download:

Current browse context:

cs.IR

< prev | next >

new | recent | 2301

Computer Science > Information Retrieval

Title: InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

Authors: Leonid Boytsov, Preksha Patel, Vivek Sourabh, Riddhi Nisar, Sayani Kundu, Ramya Ramanathan, Eric Nyberg

(Submitted on 8 Jan 2023 (v1), last revised 21 Feb 2024 (this version, v2))

Abstract: We carried out a reproducibility study of InPars, which is a method for unsupervised training of neural rankers (Bonifacio et al., 2022). As a by-product, we developed InPars-light, which is a simple-yet-effective modification of InPars. Unlike InPars, InPars-light uses 7x-100x smaller ranking models and only a freely available language model BLOOM, which -- as we found out -- produced more accurate rankers compared to a proprietary GPT-3 model. On all five English retrieval collections (used in the original InPars study) we obtained substantial (7%-30%) and statistically significant improvements over BM25 (in nDCG and MRR) using only a 30M parameter six-layer MiniLM-30M ranker and a single three-shot prompt. In contrast, in the InPars study only a 100x larger monoT5-3B model consistently outperformed BM25, whereas their smaller monoT5-220M model (which is still 7x larger than our MiniLM ranker) outperformed BM25 only on MS MARCO and TREC DL 2020. In the same three-shot prompting scenario, our 435M parameter DeBERTA v3 ranker was at par with the 7x larger monoT5-3B (average gain over BM25 of 1.3 vs 1.32): In fact, on three out of five datasets, DeBERTA slightly outperformed monoT5-3B. Finally, these good results were achieved by re-ranking only 100 candidate documents compared to 1000 used by Bonifacio et al. (2022). We believe that InPars-light is the first truly cost-effective prompt-based unsupervised recipe to train and deploy neural ranking models that outperform BM25. Our code and data is publicly available. this https URL

Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2301.02998 [cs.IR]
	(or arXiv:2301.02998v2 [cs.IR] for this version)

Submission history

From: Leonid Boytsov [view email]
[v1] Sun, 8 Jan 2023 08:03:46 GMT (73kb)
[v2] Wed, 21 Feb 2024 04:20:55 GMT (65kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2301.02998v2

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Information Retrieval

Title: InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

Submission history