Superior Parallel Big Data Clustering through Competitive Stochastic Sample Size Optimization in Big-means

Mussabayev, Rustam; Mussabayev, Ravil

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2403

Computer Science > Machine Learning

Title: Superior Parallel Big Data Clustering through Competitive Stochastic Sample Size Optimization in Big-means

Authors: Rustam Mussabayev, Ravil Mussabayev

(Submitted on 27 Mar 2024)

Abstract: This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to create a scalable variant designed for big data applications. It addresses scalability and computation time challenges typically faced with traditional techniques. The algorithm adjusts sample sizes dynamically for each worker during execution, optimizing performance. Data from these sample sizes are continually analyzed, facilitating the identification of the most efficient configuration. By incorporating a competitive element among workers using different sample sizes, efficiency within the Big-means algorithm is further stimulated. In essence, the algorithm balances computational time and clustering quality by employing a stochastic, competitive sampling strategy in a parallel computing setting.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR)
Cite as:	arXiv:2403.18766 [cs.LG]
	(or arXiv:2403.18766v1 [cs.LG] for this version)

Submission history

From: Ravil Mussabayev [view email]
[v1] Wed, 27 Mar 2024 17:05:03 GMT (57kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2403.18766

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Superior Parallel Big Data Clustering through Competitive Stochastic Sample Size Optimization in Big-means

Submission history