We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

Abstract: Scalability and accuracy are well recognized challenges in deep extreme multi-label learning where the objective is to train architectures for automatically annotating a data point with the most relevant subset of labels from an extremely large label set. This paper develops the DeepXML framework that addresses these challenges by decomposing the deep extreme multi-label task into four simpler sub-tasks each of which can be trained accurately and efficiently. Choosing different components for the four sub-tasks allows DeepXML to generate a family of algorithms with varying trade-offs between accuracy and scalability. In particular, DeepXML yields the Astec algorithm that could be 2-12% more accurate and 5-30x faster to train than leading deep extreme classifiers on publically available short text datasets. Astec could also efficiently train on Bing short text datasets containing up to 62 million labels while making predictions for billions of users and data points per day on commodity hardware. This allowed Astec to be deployed on the Bing search engine for a number of short text applications ranging from matching user queries to advertiser bid phrases to showing personalized ads where it yielded significant gains in click-through-rates, coverage, revenue and other online metrics over state-of-the-art techniques currently in production. DeepXML's code is available at this https URL
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
ACM classes: F.2.2; I.2.7
Journal reference: Web Search and Data Mining 2021
DOI: 10.1145/3437963.3441810
Cite as: arXiv:2111.06685 [cs.LG]
  (or arXiv:2111.06685v1 [cs.LG] for this version)

Submission history

From: Kunal Dahiya [view email]
[v1] Fri, 12 Nov 2021 12:25:23 GMT (1240kb,D)

Link back to: arXiv, form interface, contact.