• DocumentCode
    555974
  • Title

    A web statistics based conflation approach to improve Arabic text retrieval

  • Author

    Ahmed, Farag ; Nürnberger, Andreas

  • Author_Institution
    Data & Knowledge Eng. Group, Otto-von-Guericke-Univ. of Magdeburg, Magdeburg, Germany
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    3
  • Lastpage
    9
  • Abstract
    We present a language independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that is used to group related words based on a revised string-similarity measure. In order to detect and eliminate terms that are created by this process, but that are most likely not relevant for the query (”noisy terms”), an approach based on mutual information scores computed based on web statistical cooccurrences data is proposed. Furthermore, an evaluation of this approach is presented.
  • Keywords
    information retrieval; learning (artificial intelligence); natural language processing; statistical analysis; text analysis; Arabic text retrieval; Web statistics; conflation approach; language independent approach; mutual information scores; pure n-gram model; string-similarity measure; unsupervised method; Computational modeling; Computer science; Dictionaries; Morphology; Mutual information; Noise measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Systems (FedCSIS), 2011 Federated Conference on
  • Conference_Location
    Szczecin
  • Print_ISBN
    978-1-4577-0041-5
  • Electronic_ISBN
    978-83-60810-35-4
  • Type

    conf

  • Filename
    6078293