We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: Semi-supervised cross-entropy clustering with information bottleneck constraint

Abstract: In this paper, we propose a semi-supervised clustering method, CEC-IB, that models data with a set of Gaussian distributions and that retrieves clusters based on a partial labeling provided by the user (partition-level side information). By combining the ideas from cross-entropy clustering (CEC) with those from the information bottleneck method (IB), our method trades between three conflicting goals: the accuracy with which the data set is modeled, the simplicity of the model, and the consistency of the clustering with side information. Experiments demonstrate that CEC-IB has a performance comparable to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but is faster, more robust to noisy labels, automatically determines the optimal number of clusters, and performs well when not all classes are present in the side information. Moreover, in contrast to other semi-supervised models, it can be successfully applied in discovering natural subgroups if the partition-level side information is derived from the top levels of a hierarchical clustering.
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Journal reference: Information Sciences, vol. 421, Dec. 2017, pp. 254-271
DOI: 10.1016/j.ins.2017.07.016
Cite as: arXiv:1705.01601 [cs.LG]
  (or arXiv:1705.01601v1 [cs.LG] for this version)

Submission history

From: Marek Smieja [view email]
[v1] Wed, 3 May 2017 20:09:43 GMT (453kb,D)

Link back to: arXiv, form interface, contact.