Title :
Examining the impact of stemming on clustering Turkish texts
Author :
Tunali, Volkan ; Bilgin, Turgay Tugay
Author_Institution :
Fac. of Eng., Maltepe Univ., Istanbul, Turkey
Abstract :
Preprocessing is an important step in information retrieval and text mining. In this study, we examined the impact of stemming on clustering Turkish texts. We used two datasets compiled from web sites of Turkish news agencies, and performed extensive experiments. We empirically show that there is no significant evidence that stemming always improves the quality of clustering for texts in Turkish. However, when stemming is used, dimensionality of the document-term matrix dramatically decreases without inversely affecting the clustering performance. As a result, it is highly recommended to apply stemming for clustering Turkish texts.
Keywords :
Web sites; data mining; information retrieval; pattern clustering; text analysis; Turkish news agencies; Turkish text clustering; Web sites; document-term matrix dimensionality reduction; information retrieval; stemming impact; text clustering quality; text mining; Clustering algorithms; Educational institutions; Entropy; Text mining; Web sites; data mining; document clustering; preprocessing; stemming; text mining;
Conference_Titel :
Innovations in Intelligent Systems and Applications (INISTA), 2012 International Symposium on
Conference_Location :
Trabzon
Print_ISBN :
978-1-4673-1446-6
DOI :
10.1109/INISTA.2012.6246966