We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

q-bio.GN

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Quantitative Biology > Genomics

Title: Low-bandwidth and non-compute intensive remote identification of microbes from raw sequencing reads

Abstract: Cheap high-throughput DNA sequencing may soon become routine not only for human genomes but also for practically anything requiring the identification of living organisms from their DNA: tracking of infectious agents, control of food products, bioreactors, or environmental samples.
We propose a novel general approach to the analysis of sequencing data in which the reference genome does not have to be specified. Using a distributed architecture we are able to query a remote server for hints about what the reference might be, transferring a relatively small amount of data, and the hints can be used for more computationally-demanding work.
Our system consists of a server with known reference DNA indexed, and a client with raw sequencing reads. The client sends a sample of unidentified reads, and in return receives a list of matching references known to the server. Sequences for the references can be retrieved and used for exhaustive computation on the reads, such as alignment.
To demonstrate this approach we have implemented a web server, indexing tens of thousands of publicly available genomes and genomic regions from various organisms and returning lists of matching hits from query sequencing reads. We have also implemented two clients, one of them running in a web browser, in order to demonstrate that gigabytes of raw sequencing reads of unknown origin could be identified without the need to transfer a very large volume of data, and on modestly powered computing devices.
A web access is available at this http URL The source code for a python command-line client, a server, and supplementary data is available at this http URL
Subjects: Genomics (q-bio.GN)
DOI: 10.1371/journal.pone.0083784
Cite as: arXiv:1306.1569 [q-bio.GN]
  (or arXiv:1306.1569v2 [q-bio.GN] for this version)

Submission history

From: Laurent Gautier [view email]
[v1] Thu, 6 Jun 2013 22:37:36 GMT (975kb,D)
[v2] Fri, 21 Jun 2013 13:51:09 GMT (975kb,D)

Link back to: arXiv, form interface, contact.