We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cond-mat

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Condensed Matter > Statistical Mechanics

Title: Statistical linguistic study of DNA sequences

Authors: K. L. Ng, S. P. Li
Abstract: A new family of compound Poisson distribution functions from statistical linguistic is used to study the n-tuples and nucleotide composition features of DNA sequences. The relative frequency distribution of the 6-tuples and 7- tuples occurrence studies suggest that most of the DNA sequences follow the general shape of the compound Poisson distribution. It is also noted that the $\chi$-square test indicated that some of the sequences follow this distribution with a reasonable level of goodness of fit. The compositional segmentation study fits quite well using this new family of distribution functions. Furthermore, the absolute values of the relative frequency come out naturally from the linguistic model without ambiguity. It is suggesting that DNA sequences are not random sequences and they could possibly have subsequence structures.
Comments: 19 pages
Subjects: Statistical Mechanics (cond-mat.stat-mech); Soft Condensed Matter (cond-mat.soft); Biological Physics (physics.bio-ph); Biomolecules (q-bio.BM)
Cite as: arXiv:cond-mat/0308128 [cond-mat.stat-mech]
  (or arXiv:cond-mat/0308128v1 [cond-mat.stat-mech] for this version)

Submission history

From: Ka-Lok Ng [view email]
[v1] Thu, 7 Aug 2003 09:22:47 GMT (274kb)

Link back to: arXiv, form interface, contact.