DocumentCode :
653893
Title :
Distributed classification of Persian News (Case study: Hamshahri News dataset)
Author :
Esmaeili, Leila ; Akbari, Mohammad Kazem ; Amiry, Vahid ; Sharifian, Saeed
Author_Institution :
Comput. Eng. & Inf. Technol. Dept., Amirkabir Univ. of Technol., Tehran, Iran
fYear :
2013
fDate :
Oct. 31 2013-Nov. 1 2013
Firstpage :
46
Lastpage :
51
Abstract :
Classifying the News specifies the most likely topic that the News content refers to it. In this paper, we use distance detection in vector space model for classifying the News articles. In this method, it is calculated distances between weighted frequency vectors of each category, and the News vector determine its category by finding minimum distance with weighted frequency vector of categories. According to volume of the News articles on each topic, extracting keywords, building weighted frequency vectors and determining vector distances are very time consuming operations. So, in order to increase performance, calculation accuracy and decrease execution time, we use MapReduce, a distributed programming model, to implement and execute distributed classification of the News articles. This research is the first attempt to classifying Persian data in distributed manner and results of this research can be used for other text mining areas in any languages. It is worth mentioning that we have successfully implemented our method on the supercomputer of Amirkabir University of Technology.
Keywords :
data mining; information resources; pattern classification; Amirkabir University of Technology; Hamshahri News dataset; MapReduce; Persian News; Persian data classification; distance detection; distributed classification; distributed programming model; keywords extraction; news article classification; news vector; text mining; vector distances; vector space model; weighted frequency vectors; Classification algorithms; Computational modeling; Hardware; Manuals; Open source software; Sorting; Text categorization; Apache Hadoop; Distributed Computing; MapReduce; Text Classification; Vector Space Model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Knowledge Engineering (ICCKE), 2013 3th International eConference on
Conference_Location :
Mashhad
Print_ISBN :
978-1-4799-2092-1
Type :
conf
DOI :
10.1109/ICCKE.2013.6682829
Filename :
6682829
Link To Document :
بازگشت