Current browse context:
q-bio.QM
Change to browse by:
References & Citations
Quantitative Biology > Quantitative Methods
Title: BigC: rapid, scalable and accurate clustering of single-cell RNA-seq dat
(Submitted on 25 May 2022 (this version), latest version 7 Jul 2022 (v2))
Abstract: Identifying cell cluster is a critical step for single-cell transcriptomics study. As the rapid growth of scRNA-seq volumes, an efficient clustering method is required. Although numerous approaches are developed, they are inefficient in computation and poor in scalability. In this work, we introduce BigC, an improved spectral clustering algorithm for efficiently and accurately clustering scRNA-seq data. By employing a sub-matrix representative strategy and scaled exponential similarity kernel function, our method can drastically reduce the clustering time. We demonstrated BigC exhibits better or comparable accuracy than other state-of-the-art methods in 15 benchmark datasets with orders of magnitude lower computational cost, especially for large datasets over million cells. BigC can scale to ultra-large datasets over 10 million cells, while preserving a consistent and accurate count of cell clusters. Furthermore, we demonstrate that BigC can be used to develop a consensus clustering method BigCC, which greatly improves the runtime and scalability of state-of-the-art methods while maintaining accuracy.
Submission history
From: Nana Wei [view email][v1] Wed, 25 May 2022 01:40:41 GMT (2559kb)
[v2] Thu, 7 Jul 2022 13:27:03 GMT (3076kb)
Link back to: arXiv, form interface, contact.