We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Quantitative Biology > Quantitative Methods

Title: Discovering genetic networks using compressive sensing

Abstract: A first analysis applying compressive sensing to a quantitative biological trait and its compressible "frequency domain" is presented. Consider an $n$-bit genetic sequence and suppose we want to discover a function that maps participating alleles (or even environmental influences) to a particular trait. Under plausible assumptions of how they evolved, certain traits can be viewed as "smooth" functions on the $n$-dimensional Boolean lattice of possible genomes. This allows approximation of their Fourier transforms, i.e., their gene networks, as sparse, dominated by "low-frequency" components. In turn, we can apply compressive sensing methods to collect relatively few samples, yet achieve accurate recovery.
For an arbitrary quantitative trait affected by $n=26$ genes and with $812$ meaningful gene interactions, our simulations show noisy trait measurements ($\mathrm{SNR}=20\,\mathrm{dB}$) from just $M=44,\!336$ genomes in a population of size $N = 2^{26}$ (undersample ratio $M/N\approx0.00066$) permit discovering its gene network and predicting trait values, both with about $97.6\%$ accuracy. More dramatic undersample ratios are possible for traits affected by more genes. Work is currently underway to see if empirical data fit the proposed model. If so, it could offer a radical reduction in the number of measurements -- from exponential to polynomial in some cases -- necessary to quantify the relationship between genomes and certain traits.
Comments: 30 pages, 6 figures. Preprint submitted to Journal of Theoretical Biology
Subjects: Quantitative Methods (q-bio.QM); Information Theory (cs.IT); Molecular Networks (q-bio.MN)
MSC classes: 92C42, 92D20 (Primary), 68P30, 94D10, 17D92 (Secondary)
ACM classes: J.3; E.4
Cite as: arXiv:2101.01234 [q-bio.QM]
  (or arXiv:2101.01234v1 [q-bio.QM] for this version)

Submission history

From: Matthew Herman [view email]
[v1] Mon, 4 Jan 2021 20:54:43 GMT (333kb,D)

Link back to: arXiv, form interface, contact.