We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cond-mat

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Condensed Matter > Statistical Mechanics

Title: Dictionary based methods for information extraction

Abstract: In this paper we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called "dictionary" of a sequence. Dictionaries are intrinsically interesting and a study of their features can be of great usefulness to investigate the properties of the sequences they have been extracted from (e.g. DNA strings). We then describe a procedure of string comparison between dictionary-created sequences (or "artificial texts") that gives very good results in several contexts. We finally present some results on self-consistent classification problems.
Comments: 7 pages, Latex, elsart style
Subjects: Statistical Mechanics (cond-mat.stat-mech); Other Condensed Matter (cond-mat.other); Information Retrieval (cs.IR); Genomics (q-bio.GN); Other Quantitative Biology (q-bio.OT)
Journal reference: Physica A - Vol 342/1-2 pp 294-300 (2004)
DOI: 10.1016/j.physa.2004.01.072
Cite as: arXiv:cond-mat/0402581 [cond-mat.stat-mech]
  (or arXiv:cond-mat/0402581v2 [cond-mat.stat-mech] for this version)

Submission history

From: Andrea Baronchelli [view email]
[v1] Tue, 24 Feb 2004 11:34:53 GMT (93kb)
[v2] Tue, 14 Sep 2004 10:54:32 GMT (93kb)

Link back to: arXiv, form interface, contact.