• DocumentCode
    3401305
  • Title

    Automatic keywords extraction from the domain texts: Implementation of the algorithm based on the MapReduce model

  • Author

    Nugumanova, Aliya ; Novosselov, Artem ; Baiburin, Yerzhan ; Karimov, Alexey

  • Author_Institution
    Dept. of Inf. Technol., Eastern Kazakhstan State Tech. Univ., Ust-Kamenogorsk, Kazakhstan
  • fYear
    2013
  • fDate
    11-12 Dec. 2013
  • Firstpage
    186
  • Lastpage
    189
  • Abstract
    Automatic keywords extraction is used in almost all the tasks related to natural language processing, such as annotation, indexing, classification, machine translation, knowledge extraction, etc. A large number of effective methods and approaches were developed to solve this problem, and the most simple and robust ones of them are based on the statistics of words. In this paper we describe a statistical method based on Chi-square test. The traditional algorithm implementing this method is an inefficient and time-consuming one. The aim of the paper is to develop the algorithm of this method based on distributed computing model. So we describe the implementation of the algorithm based on the MapReduce model of distributed computing and present the results of experiments showing the benefits of distributed computing.
  • Keywords
    distributed processing; information retrieval; natural language processing; text analysis; Chi-square test; MapReduce model; automatic keywords extraction; distributed computing model; domain texts; statistical method; Abstracts; Barium; Government; Human computer interaction; Natural languages; chi-square test; keywords extraction; mapreduce; natural language processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Current Trends in Information Technology (CTIT), 2013 International Conference on
  • Conference_Location
    Dubai
  • Type

    conf

  • DOI
    10.1109/CTIT.2013.6749500
  • Filename
    6749500