• DocumentCode
    131246
  • Title

    Evaluating preprocessing by turing machine in text categorization

  • Author

    Ghalehtaki, Razieh Abbasi ; Khotanlou, Hassan ; Esmaeilpour, Mansour

  • Author_Institution
    Dept. of Comput. Eng., Islamic Azad Univ., Hamedan, Iran
  • fYear
    2014
  • fDate
    4-6 Feb. 2014
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    By developing the World Wide Web, text categorization becomes a key way to deal with a large number of data and organize them. Automatic text categorization has three steps: preprocessing, extracting relevant features and categorization documents into specified categories. In this article, we propose a new preprocessing method by Turing Machine. All of four steps in preprocessing such as sentence segmentation, tokenization, stop word removal and word stemming are done by Turing Machine. The support vector machine model on the Reuters and PAGOD dataset is used to present importance of preprocessing by Turing Machine. We used from term weighting, feature subset selection and feature reduction techniques to find the best document representation. Experiments show that our proposed method is more accurate than other methods.
  • Keywords
    Turing machines; support vector machines; text analysis; PAGOD dataset; Reuters dataset; Turing machine; World Wide Web; automatic text categorization; document categorization; document preprocessing; document representation; feature extraction; feature reduction technique; feature subset selection technique; sentence segmentation; stop word removal; support vector machine model; term weighting; text organization; tokenization; word stemming; Computers; Educational institutions; Magnetic heads; Support vector machines; Text categorization; Turing machines; Weight measurement; Preprocessing; Support Vector Machines; Turing Machine; text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems (ICIS), 2014 Iranian Conference on
  • Conference_Location
    Bam
  • Print_ISBN
    978-1-4799-3350-1
  • Type

    conf

  • DOI
    10.1109/IranianCIS.2014.6802540
  • Filename
    6802540