DocumentCode
131246
Title
Evaluating preprocessing by turing machine in text categorization
Author
Ghalehtaki, Razieh Abbasi ; Khotanlou, Hassan ; Esmaeilpour, Mansour
Author_Institution
Dept. of Comput. Eng., Islamic Azad Univ., Hamedan, Iran
fYear
2014
fDate
4-6 Feb. 2014
Firstpage
1
Lastpage
6
Abstract
By developing the World Wide Web, text categorization becomes a key way to deal with a large number of data and organize them. Automatic text categorization has three steps: preprocessing, extracting relevant features and categorization documents into specified categories. In this article, we propose a new preprocessing method by Turing Machine. All of four steps in preprocessing such as sentence segmentation, tokenization, stop word removal and word stemming are done by Turing Machine. The support vector machine model on the Reuters and PAGOD dataset is used to present importance of preprocessing by Turing Machine. We used from term weighting, feature subset selection and feature reduction techniques to find the best document representation. Experiments show that our proposed method is more accurate than other methods.
Keywords
Turing machines; support vector machines; text analysis; PAGOD dataset; Reuters dataset; Turing machine; World Wide Web; automatic text categorization; document categorization; document preprocessing; document representation; feature extraction; feature reduction technique; feature subset selection technique; sentence segmentation; stop word removal; support vector machine model; term weighting; text organization; tokenization; word stemming; Computers; Educational institutions; Magnetic heads; Support vector machines; Text categorization; Turing machines; Weight measurement; Preprocessing; Support Vector Machines; Turing Machine; text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Systems (ICIS), 2014 Iranian Conference on
Conference_Location
Bam
Print_ISBN
978-1-4799-3350-1
Type
conf
DOI
10.1109/IranianCIS.2014.6802540
Filename
6802540
Link To Document