We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

q-bio.QM

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Quantitative Biology > Quantitative Methods

Title: BigC: rapid, scalable and accurate clustering of single-cell RNA-seq dat

Abstract: Identifying cell cluster is a critical step for single-cell transcriptomics study. As the rapid growth of scRNA-seq volumes, an efficient clustering method is required. Although numerous approaches are developed, they are inefficient in computation and poor in scalability. In this work, we introduce BigC, an improved spectral clustering algorithm for efficiently and accurately clustering scRNA-seq data. By employing a sub-matrix representative strategy and scaled exponential similarity kernel function, our method can drastically reduce the clustering time. We demonstrated BigC exhibits better or comparable accuracy than other state-of-the-art methods in 15 benchmark datasets with orders of magnitude lower computational cost, especially for large datasets over million cells. BigC can scale to ultra-large datasets over 10 million cells, while preserving a consistent and accurate count of cell clusters. Furthermore, we demonstrate that BigC can be used to develop a consensus clustering method BigCC, which greatly improves the runtime and scalability of state-of-the-art methods while maintaining accuracy.
Subjects: Quantitative Methods (q-bio.QM); Genomics (q-bio.GN)
Cite as: arXiv:2205.12432 [q-bio.QM]
  (or arXiv:2205.12432v1 [q-bio.QM] for this version)

Submission history

From: Nana Wei [view email]
[v1] Wed, 25 May 2022 01:40:41 GMT (2559kb)
[v2] Thu, 7 Jul 2022 13:27:03 GMT (3076kb)

Link back to: arXiv, form interface, contact.