References & Citations
Computer Science > Information Retrieval
Title: Noise tolerance of learning to rank under class-conditional label noise
(Submitted on 3 Aug 2022 (v1), last revised 17 Aug 2022 (this version, v2))
Abstract: Often, the data used to train ranking models is subject to label noise. For example, in web-search, labels created from clickstream data are noisy due to issues such as insufficient information in item descriptions on the SERP, query reformulation by the user, and erratic or unexpected user behavior. In practice, it is difficult to handle label noise without making strong assumptions about the label generation process. As a result, practitioners typically train their learning-to-rank (LtR) models directly on this noisy data without additional consideration of the label noise. Surprisingly, we often see strong performance from LtR models trained in this way. In this work, we describe a class of noise-tolerant LtR losses for which empirical risk minimization is a consistent procedure, even in the context of class-conditional label noise. We also develop noise-tolerant analogs of commonly used loss functions. The practical implications of our theoretical findings are further supported by experimental results.
Submission history
From: Dany Haddad [view email][v1] Wed, 3 Aug 2022 15:04:48 GMT (410kb,D)
[v2] Wed, 17 Aug 2022 15:12:44 GMT (411kb,D)
Link back to: arXiv, form interface, contact.