We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: Generating Multidimensional Clusters With Support Lines

Abstract: Synthetic data is essential for assessing clustering techniques, complementing and extending real data, and allowing for a more complete coverage of a given problem's space. In turn, synthetic data generators have the potential of creating vast amounts of data -- a crucial activity when real-world data is at premium -- while providing a well-understood generation procedure and an interpretable instrument for methodically investigating cluster analysis algorithms. Here, we present \textit{Clugen}, a modular procedure for synthetic data generation, capable of creating multidimensional clusters supported by line segments using arbitrary distributions. \textit{Clugen} is open source, 100\% unit tested and fully documented, and is available for the Python, R, Julia and MATLAB/Octave ecosystems. We demonstrate that our proposal is able to produce rich and varied results in various dimensions, is fit for use in the assessment of clustering algorithms, and has the potential to be a widely used framework in diverse clustering-related research tasks.
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Programming Languages (cs.PL)
ACM classes: I.5; I.2.5
Cite as: arXiv:2301.10327 [cs.LG]
  (or arXiv:2301.10327v1 [cs.LG] for this version)

Submission history

From: Nuno Fachada [view email]
[v1] Tue, 24 Jan 2023 22:08:24 GMT (12970kb,D)

Link back to: arXiv, form interface, contact.