DocumentCode
3286406
Title
Towards effective processing of large text collections
Author
Szymanski, Janusz ; Krawczyk, Harald
Author_Institution
Dept. of Electron., Telecommun. & Inf., Gdansk Univ. of Technol., Gdańsk, Poland
fYear
2012
fDate
18-20 Sept. 2012
Firstpage
265
Lastpage
270
Abstract
In the article we describe the approach to parallel implementation of elementary operations for textual data categorization. In the experiments we evaluate parallel computations of similarity matrices and k-means algorithm. The test datasets have been prepared as graphs created from Wikipedia articles related with links. When we create the clustering data packages, we compute pairs of eigenvectors and eigenvalues for visualizations of the datasets. We describe the method used for evaluation of the clustering quality. Finally we discuss achieved results, point some improvements and perspectives for future development.
Keywords
Web sites; data visualisation; eigenvalues and eigenfunctions; matrix algebra; pattern clustering; text analysis; Wikipedia articles; clustering data packages; clustering quality; dataset visualizations; eigenvalues; eigenvectors; elementary operations; graphs; k-means algorithm; large text collections; parallel computations; similarity matrices; textual data categorization; PCA; documents categorization; text clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Innovative Computing Technology (INTECH), 2012 Second International Conference on
Conference_Location
Casablanca
Print_ISBN
978-1-4673-2678-0
Type
conf
DOI
10.1109/INTECH.2012.6457784
Filename
6457784
Link To Document