Title :
Zipping out relevant information
Author :
Benedetto, Dario ; Caglioti, Emanuele ; Loreto, Vittorio
Author_Institution :
Math. Dept., La Sapienza Univ., Rome, Italy
Abstract :
Although the abundance of information and its accessibility represents an important cultural advance, it also introduces a new challenge: retrieving relevant information. However, the growing body of available data provides an ideal test bed for theoretical constructions and models. This opportunity has stimulated considerable interest from researchers in many different communities-physicists, mathematicians, economists, and statisticians, to name a few. In this spirit, we seek to discover the most suitable tools for examining large masses of data and extracting useful information from it. The information-theoretic method described in this article applies to any kind of corpora of character strings, independent of the type of coding behind them. The method has great potential for fields where human intuition might fail: DNA and protein sequences, geological time series, stock market data, medical monitoring, and so on.
Keywords :
information needs; information retrieval; information theory; scientific information systems; DNA sequences; geological time series; information extraction; information theoretic method; medical monitoring; protein sequences; relevant information retrieval; stock market data; Cultural differences; DNA; Data mining; Geology; Global communication; Humans; Information retrieval; Proteins; Sequences; Testing;
Journal_Title :
Computing in Science & Engineering
DOI :
10.1109/MCISE.2003.1166556