DocumentCode :
131246
Title :
Evaluating preprocessing by turing machine in text categorization
Author :
Ghalehtaki, Razieh Abbasi ; Khotanlou, Hassan ; Esmaeilpour, Mansour
Author_Institution :
Dept. of Comput. Eng., Islamic Azad Univ., Hamedan, Iran
fYear :
2014
fDate :
4-6 Feb. 2014
Firstpage :
1
Lastpage :
6
Abstract :
By developing the World Wide Web, text categorization becomes a key way to deal with a large number of data and organize them. Automatic text categorization has three steps: preprocessing, extracting relevant features and categorization documents into specified categories. In this article, we propose a new preprocessing method by Turing Machine. All of four steps in preprocessing such as sentence segmentation, tokenization, stop word removal and word stemming are done by Turing Machine. The support vector machine model on the Reuters and PAGOD dataset is used to present importance of preprocessing by Turing Machine. We used from term weighting, feature subset selection and feature reduction techniques to find the best document representation. Experiments show that our proposed method is more accurate than other methods.
Keywords :
Turing machines; support vector machines; text analysis; PAGOD dataset; Reuters dataset; Turing machine; World Wide Web; automatic text categorization; document categorization; document preprocessing; document representation; feature extraction; feature reduction technique; feature subset selection technique; sentence segmentation; stop word removal; support vector machine model; term weighting; text organization; tokenization; word stemming; Computers; Educational institutions; Magnetic heads; Support vector machines; Text categorization; Turing machines; Weight measurement; Preprocessing; Support Vector Machines; Turing Machine; text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems (ICIS), 2014 Iranian Conference on
Conference_Location :
Bam
Print_ISBN :
978-1-4799-3350-1
Type :
conf
DOI :
10.1109/IranianCIS.2014.6802540
Filename :
6802540
Link To Document :
بازگشت