DocumentCode :
2734042
Title :
Preprocessing of Slovak Blog Articles for Clustering
Author :
Kuzar, Tomas ; Navrat, Pavol
Author_Institution :
FIIT, Inst. of Inf. & Software Eng., STU, Bratislava, Slovakia
Volume :
3
fYear :
2010
fDate :
Aug. 31 2010-Sept. 3 2010
Firstpage :
314
Lastpage :
317
Abstract :
Web content clustering is very important part of topic detection and tracking issue. In our paper we focus on pre-processing phase of web content clustering. We focus on blog articles published in Slovak language. We evaluate the impact of different data pre-processing methods on success of blog clustering. We found out that applying various text data manipulation techniques in preprocessing can improve the quality of clusters. The quality of clusters is measured by traditional clustering metrics like precision, recall and F-measure.
Keywords :
Web sites; data mining; natural language processing; pattern clustering; text analysis; Slovak blog article; Slovak language; Web content clustering; blog clustering; cluster quality; data preprocessing method; text data manipulation technique; topic detection; tracking issue; Buildings; Data preprocessing; Dictionaries; Internet; Media; Web sites; Categorization; Text Mining; Text Preprocessing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
Conference_Location :
Toronto, ON
Print_ISBN :
978-1-4244-8482-9
Electronic_ISBN :
978-0-7695-4191-4
Type :
conf
DOI :
10.1109/WI-IAT.2010.273
Filename :
5614178
Link To Document :
بازگشت