Separating populations with wide data: A spectral analysis

Blum, Avrim; Coja-Oghlan, Amin; Frieze, Alan; Zhou, Shuheng

doi:10.1214/08-EJS289

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 0706

Statistics > Machine Learning

Title: Separating populations with wide data: A spectral analysis

Authors: Avrim Blum, Amin Coja-Oghlan, Alan Frieze, Shuheng Zhou

(Submitted on 25 Jun 2007 (v1), last revised 29 Jan 2009 (this version, v2))

Abstract: In this paper, we consider the problem of partitioning a small data sample drawn from a mixture of $k$ product distributions. We are interested in the case that individual features are of low average quality $\gamma$, and we want to use as few of them as possible to correctly partition the sample. We analyze a spectral technique that is able to approximately optimize the total data size--the product of number of data points $n$ and the number of features $K$--needed to correctly perform this partitioning as a function of $1/\gamma$ for $K>n$. Our goal is motivated by an application in clustering individuals according to their population of origin using markers, when the divergence between any two of the populations is small.

Comments:	Published in at this http URL the Electronic Journal of Statistics (this http URL) by the Institute of Mathematical Statistics (this http URL)
Subjects:	Machine Learning (stat.ML); Applications (stat.AP)
MSC classes:	60K35, 60K35 (Primary), 60K35 (Secondary)
Journal reference:	Electronic Journal of Statistics 2009, Vol. 3, 76-113
DOI:	10.1214/08-EJS289
Report number:	IMS-EJS-EJS_2008_289
Cite as:	arXiv:0706.3434 [stat.ML]
	(or arXiv:0706.3434v2 [stat.ML] for this version)

Submission history

From: Shuheng Zhou [view email]
[v1] Mon, 25 Jun 2007 08:03:25 GMT (40kb)
[v2] Thu, 29 Jan 2009 11:31:54 GMT (145kb,S)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:0706.3434

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Separating populations with wide data: A spectral analysis

Submission history