We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IR

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Statistics > Machine Learning

Title: Exclusive Topic Modeling

Authors: Hao Lei, Ying Chen
Abstract: We propose an Exclusive Topic Modeling (ETM) for unsupervised text classification, which is able to 1) identify the field-specific keywords though less frequently appeared and 2) deliver well-structured topics with exclusive words. In particular, a weighted Lasso penalty is imposed to reduce the dominance of the frequently appearing yet less relevant words automatically, and a pairwise Kullback-Leibler divergence penalty is used to implement topics separation. Simulation studies demonstrate that the ETM detects the field-specific keywords, while LDA fails. When applying to the benchmark NIPS dataset, the topic coherence score on average improves by 22% and 10% for the model with weighted Lasso penalty and pairwise Kullback-Leibler divergence penalty, respectively.
Subjects: Machine Learning (stat.ML); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as: arXiv:2102.03525 [stat.ML]
  (or arXiv:2102.03525v1 [stat.ML] for this version)

Submission history

From: Hao Lei [view email]
[v1] Sat, 6 Feb 2021 07:03:15 GMT (1669kb,D)

Link back to: arXiv, form interface, contact.