Analysis of preprocessing methods on classification of Turkish texts

Author

Dilara Torunoğlu;Erhan Çakirman;Murat Can Ganiz;Selim Akyokuş;M. Zahid Gürbüz

Author_Institution

Department of Computer Engineering, Doğ

fYear

2011

fDate

6/1/2011 12:00:00 AM

Firstpage

112

Lastpage

117

Abstract

Preprocessing is an important task and critical step in information retrieval and text mining. The objective of this study is to analyze the effect of preprocessing methods in text classification on Turkish texts. We compiled two large datasets from Turkish newspapers using a crawler. On these compiled data sets and using two additional datasets, we perform a detailed analysis of preprocessing methods such as stemming, stopword filtering and word weighting for Turkish text classification on several different Turkish datasets. We report the results of extensive experiments.

Keywords

"Text categorization","Support vector machines","Training","Classification algorithms","Filtering","Text mining","Information retrieval"

Publisher

ieee

Conference_Titel

Innovations in Intelligent Systems and Applications (INISTA), 2011 International Symposium on

Print_ISBN

978-1-61284-919-5

Type

conf

DOI

10.1109/INISTA.2011.5946084

Filename

5946084

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3642078