We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IR

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Information Retrieval

Title: Noise tolerance of learning to rank under class-conditional label noise

Authors: Dany Haddad
Abstract: Often, the data used to train ranking models is subject to label noise. For example, in web-search, labels created from clickstream data are noisy due to issues such as insufficient information in item descriptions on the SERP, query reformulation by the user, and erratic or unexpected user behavior. In practice, it is difficult to handle label noise without making strong assumptions about the label generation process. As a result, practitioners typically train their learning-to-rank (LtR) models directly on this noisy data without additional consideration of the label noise. Surprisingly, we often see strong performance from LtR models trained in this way. In this work, we describe a class of noise-tolerant LtR losses for which empirical risk minimization is a consistent procedure, even in the context of class-conditional label noise. We also develop noise-tolerant analogs of commonly used loss functions. The practical implications of our theoretical findings are further supported by experimental results.
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as: arXiv:2208.02126 [cs.IR]
  (or arXiv:2208.02126v2 [cs.IR] for this version)

Submission history

From: Dany Haddad [view email]
[v1] Wed, 3 Aug 2022 15:04:48 GMT (410kb,D)
[v2] Wed, 17 Aug 2022 15:12:44 GMT (411kb,D)

Link back to: arXiv, form interface, contact.