We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IT

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Information Theory

Title: Haplotype Assembly: An Information Theoretic View

Abstract: This paper studies the haplotype assembly problem from an information theoretic perspective. A haplotype is a sequence of nucleotide bases on a chromosome, often conveniently represented by a binary string, that differ from the bases in the corresponding positions on the other chromosome in a homologous pair. Information about the order of bases in a genome is readily inferred using short reads provided by high-throughput DNA sequencing technologies. In this paper, the recovery of the target pair of haplotype sequences using short reads is rephrased as a joint source-channel coding problem. Two messages, representing haplotypes and chromosome memberships of reads, are encoded and transmitted over an erasure channel, where the channel model reflects salient features of high-throughput sequencing. In the absence of sequencing noise, both the necessary and sufficient conditions are presented with order-wise optimal bounds for perfect haplotype recovery. A brief discussion of the erroneous scenario is also included.
Comments: 5 pages, 3 figures, conference
Subjects: Information Theory (cs.IT)
Cite as: arXiv:1404.0097 [cs.IT]
  (or arXiv:1404.0097v1 [cs.IT] for this version)

Submission history

From: Hongbo Si [view email]
[v1] Tue, 1 Apr 2014 01:32:07 GMT (76kb)
[v2] Sun, 11 May 2014 19:49:47 GMT (144kb)

Link back to: arXiv, form interface, contact.