We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IR

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Information Retrieval

Title: Binary Code based Hash Embedding for Web-scale Applications

Abstract: Nowadays, deep learning models are widely adopted in web-scale applications such as recommender systems, and online advertising. In these applications, embedding learning of categorical features is crucial to the success of deep learning models. In these models, a standard method is that each categorical feature value is assigned a unique embedding vector which can be learned and optimized. Although this method can well capture the characteristics of the categorical features and promise good performance, it can incur a huge memory cost to store the embedding table, especially for those web-scale applications. Such a huge memory cost significantly holds back the effectiveness and usability of EDRMs. In this paper, we propose a binary code based hash embedding method which allows the size of the embedding table to be reduced in arbitrary scale without compromising too much performance. Experimental evaluation results show that one can still achieve 99\% performance even if the embedding table size is reduced 1000$\times$ smaller than the original one with our proposed method.
Comments: CIKM 2021, 5 pages; The first two authors contributed equally to this work
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Cite as: arXiv:2109.02471 [cs.IR]
  (or arXiv:2109.02471v1 [cs.IR] for this version)

Submission history

From: Bencheng Yan [view email]
[v1] Tue, 24 Aug 2021 11:51:15 GMT (29897kb,D)

Link back to: arXiv, form interface, contact.