Title :
Feature Space Transformations in Document Clustering
Author :
Csorba, Kristóf ; Vajk, István
Author_Institution :
Dept. of Autom. & Appl. Informatics, Budapest Univ. of Technol. & Econ.
Abstract :
Document clustering is a part of information retrieval, where documents written in natural language are being assigned to different groups based on some criteria. In the current case, documents with similar topics are collected. As there are many methods and additional noise filtering techniques to do this, this paper focuses on the composition of such transformations and on the comparison of the configurations build from a subset of these transformations as tiles of the whole procedure. 5 tile methods (term filtering, frequency quantizing, principal component analysis (PCA), term clustering and document clustering of course) are used. These are compared based on the maximal achieved F-measure and time consumption to find the best composition
Keywords :
document handling; information retrieval; pattern clustering; principal component analysis; PCA; document clustering; document collection; document retrieval; feature space transformation; frequency quantization; information retrieval; natural language; noise filtering technique; principal component analysis; term clustering; term filtering; Automation; Filtering; Frequency; Informatics; Information retrieval; Natural languages; Principal component analysis; Space technology; Text analysis; Tiles;
Conference_Titel :
Intelligent Engineering Systems, 2006. INES '06. Proceedings. International Conference on
Conference_Location :
London
Print_ISBN :
0-7803-9708-8
DOI :
10.1109/INES.2006.1689364