Current browse context:
math.PR
Change to browse by:
References & Citations
Quantitative Biology > Quantitative Methods
Title: Counting unique molecular identifiers in sequencing using a multitype branching process with immigration
(Submitted on 13 May 2022 (v1), last revised 5 Jun 2022 (this version, v2))
Abstract: Detection of extremely rare variant alleles, such as tumour DNA, within a complex mixture of DNA molecules is experimentally challenging due to sequencing errors. Barcoding of target DNA molecules in library construction for next-generation sequencing provides a way to identify and bioinformatically remove polymerase induced errors. During the barcoding procedure involving $t$ consecutive PCR cycles, the DNA molecules become barcoded by unique molecular identifiers (UMI). Different library construction protocols utilise different values of $t$. The effect of a larger $t$ and imperfect PCR amplifications is poorly described.
This paper proposes a branching process with growing immigration as a model describing the random outcome of $t$ cycles of PCR barcoding. Our model discriminates between five different amplification rates $r_1$, $r_2$, $r_3$, $r_4$, $r$ for different types of molecules associated with the PCR barcoding procedure. We study this model by focussing on $C_t$, the number of clusters of molecules sharing the same UMI, as well as $C_t(m)$, the number of UMI clusters of size $m$. Our main finding is a remarkable asymptotic pattern valid for moderately large $t$. It turns out that $E(C_t(m))/E(C_t)\approx 2^{-m}$ for $m=1,2,\ldots$, regardless of the underlying parameters $(r_1,r_2,r_3,r_4,r)$. The knowledge of the quantities $C_t$ and $C_t(m)$ as functions of the experimental parameters $t$ and $(r_1,r_2,r_3,r_4,r)$ will help the users to draw more adequate conclusions from the outcomes of different sequencing protocols.
Submission history
From: Serik Sagitov [view email][v1] Fri, 13 May 2022 00:33:00 GMT (632kb,D)
[v2] Sun, 5 Jun 2022 10:33:30 GMT (634kb,D)
Link back to: arXiv, form interface, contact.