DocumentCode
2300542
Title
Contextual information retrieval based on algorithmic information theory and statistical outlier detection
Author
Martinez, Rafael ; Cebrián, Manuel ; De Borja Rodríguez, Francisco ; Camacho, David
Author_Institution
Dept. de Ing. Inf., Univ. Autonoma de Madrid, Madrid
fYear
2008
fDate
5-9 May 2008
Firstpage
292
Lastpage
297
Abstract
This work presents an Information Retrieval technique based on algorithmic information theory (using the normalized compression distance), statistical data outlier detection, and a novel database structure. The paper shows how they all can be integrated to retrieve information from generic databases using long text-based queries. Two important problems are addressed. On the one hand, we analyze and tyr to solve the detection of a particular case of false positives: when the distance among two documents is outlyingly low but there is not actual similarity. On the other hand, we propose a way to structure the database such that the similarity distance estimation scales well with the length of the size of the query. All design choices are justified with an experimental evaluation.
Keywords
information retrieval; information theory; text analysis; algorithmic information theory; contextual information retrieval; generic databases; long text-based queries; statistical data outlier detection; Computer science; Databases; Information retrieval; Information theory; Music information retrieval; Pattern recognition; Search engines; Space technology; Statistics; Text analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Theory Workshop, 2008. ITW '08. IEEE
Conference_Location
Porto
Print_ISBN
978-1-4244-2269-2
Electronic_ISBN
978-1-4244-2271-5
Type
conf
DOI
10.1109/ITW.2008.4578672
Filename
4578672
Link To Document