• DocumentCode
    3642078
  • Title

    Analysis of preprocessing methods on classification of Turkish texts

  • Author

    Dilara Torunoğlu;Erhan Çakirman;Murat Can Ganiz;Selim Akyokuş;M. Zahid Gürbüz

  • Author_Institution
    Department of Computer Engineering, Doğ
  • fYear
    2011
  • fDate
    6/1/2011 12:00:00 AM
  • Firstpage
    112
  • Lastpage
    117
  • Abstract
    Preprocessing is an important task and critical step in information retrieval and text mining. The objective of this study is to analyze the effect of preprocessing methods in text classification on Turkish texts. We compiled two large datasets from Turkish newspapers using a crawler. On these compiled data sets and using two additional datasets, we perform a detailed analysis of preprocessing methods such as stemming, stopword filtering and word weighting for Turkish text classification on several different Turkish datasets. We report the results of extensive experiments.
  • Keywords
    "Text categorization","Support vector machines","Training","Classification algorithms","Filtering","Text mining","Information retrieval"
  • Publisher
    ieee
  • Conference_Titel
    Innovations in Intelligent Systems and Applications (INISTA), 2011 International Symposium on
  • Print_ISBN
    978-1-61284-919-5
  • Type

    conf

  • DOI
    10.1109/INISTA.2011.5946084
  • Filename
    5946084