• DocumentCode
    2300542
  • Title

    Contextual information retrieval based on algorithmic information theory and statistical outlier detection

  • Author

    Martinez, Rafael ; Cebrián, Manuel ; De Borja Rodríguez, Francisco ; Camacho, David

  • Author_Institution
    Dept. de Ing. Inf., Univ. Autonoma de Madrid, Madrid
  • fYear
    2008
  • fDate
    5-9 May 2008
  • Firstpage
    292
  • Lastpage
    297
  • Abstract
    This work presents an Information Retrieval technique based on algorithmic information theory (using the normalized compression distance), statistical data outlier detection, and a novel database structure. The paper shows how they all can be integrated to retrieve information from generic databases using long text-based queries. Two important problems are addressed. On the one hand, we analyze and tyr to solve the detection of a particular case of false positives: when the distance among two documents is outlyingly low but there is not actual similarity. On the other hand, we propose a way to structure the database such that the similarity distance estimation scales well with the length of the size of the query. All design choices are justified with an experimental evaluation.
  • Keywords
    information retrieval; information theory; text analysis; algorithmic information theory; contextual information retrieval; generic databases; long text-based queries; statistical data outlier detection; Computer science; Databases; Information retrieval; Information theory; Music information retrieval; Pattern recognition; Search engines; Space technology; Statistics; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Theory Workshop, 2008. ITW '08. IEEE
  • Conference_Location
    Porto
  • Print_ISBN
    978-1-4244-2269-2
  • Electronic_ISBN
    978-1-4244-2271-5
  • Type

    conf

  • DOI
    10.1109/ITW.2008.4578672
  • Filename
    4578672