References & Citations
Condensed Matter > Statistical Mechanics
Title: Dictionary based methods for information extraction
(Submitted on 24 Feb 2004 (v1), last revised 14 Sep 2004 (this version, v2))
Abstract: In this paper we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called "dictionary" of a sequence. Dictionaries are intrinsically interesting and a study of their features can be of great usefulness to investigate the properties of the sequences they have been extracted from (e.g. DNA strings). We then describe a procedure of string comparison between dictionary-created sequences (or "artificial texts") that gives very good results in several contexts. We finally present some results on self-consistent classification problems.
Submission history
From: Andrea Baronchelli [view email][v1] Tue, 24 Feb 2004 11:34:53 GMT (93kb)
[v2] Tue, 14 Sep 2004 10:54:32 GMT (93kb)
Link back to: arXiv, form interface, contact.