We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ML

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Machine Learning

Title: Toward Interpretable Topic Discovery via Anchored Correlation Explanation

Abstract: Many predictive tasks, such as diagnosing a patient based on their medical chart, are ultimately defined by the decisions of human experts. Unfortunately, encoding experts' knowledge is often time consuming and expensive. We propose a simple way to use fuzzy and informal knowledge from experts to guide discovery of interpretable latent topics in text. The underlying intuition of our approach is that latent factors should be informative about both correlations in the data and a set of relevance variables specified by an expert. Mathematically, this approach is a combination of the information bottleneck and Total Correlation Explanation (CorEx). We give a preliminary evaluation of Anchored CorEx, showing that it produces more coherent and interpretable topics on two distinct corpora.
Comments: presented at 2016 ICML Workshop on #Data4Good: Machine Learning in Social Good Applications, New York, NY
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as: arXiv:1606.07043 [stat.ML]
  (or arXiv:1606.07043v1 [stat.ML] for this version)

Submission history

From: David Kale [view email]
[v1] Wed, 22 Jun 2016 19:00:38 GMT (301kb,D)

Link back to: arXiv, form interface, contact.