Title :
Automatic keywords extraction from the domain texts: Implementation of the algorithm based on the MapReduce model
Author :
Nugumanova, Aliya ; Novosselov, Artem ; Baiburin, Yerzhan ; Karimov, Alexey
Author_Institution :
Dept. of Inf. Technol., Eastern Kazakhstan State Tech. Univ., Ust-Kamenogorsk, Kazakhstan
Abstract :
Automatic keywords extraction is used in almost all the tasks related to natural language processing, such as annotation, indexing, classification, machine translation, knowledge extraction, etc. A large number of effective methods and approaches were developed to solve this problem, and the most simple and robust ones of them are based on the statistics of words. In this paper we describe a statistical method based on Chi-square test. The traditional algorithm implementing this method is an inefficient and time-consuming one. The aim of the paper is to develop the algorithm of this method based on distributed computing model. So we describe the implementation of the algorithm based on the MapReduce model of distributed computing and present the results of experiments showing the benefits of distributed computing.
Keywords :
distributed processing; information retrieval; natural language processing; text analysis; Chi-square test; MapReduce model; automatic keywords extraction; distributed computing model; domain texts; statistical method; Abstracts; Barium; Government; Human computer interaction; Natural languages; chi-square test; keywords extraction; mapreduce; natural language processing;
Conference_Titel :
Current Trends in Information Technology (CTIT), 2013 International Conference on
Conference_Location :
Dubai
DOI :
10.1109/CTIT.2013.6749500