DocumentCode
3301048
Title
Enhanced document clustering using fusion of multiscale wavelet decomposition
Author
Hussin, Mahmoud F. ; El Rube, Ibrahim ; Kamel, Mohamed S.
Author_Institution
Arab Acad. for Sci. & Technol. & Maritime Transp., Alexandria
fYear
2008
fDate
March 31 2008-April 4 2008
Firstpage
870
Lastpage
874
Abstract
Most term weighting schemes for text document clustering depend on the term frequency based analysis of the text contents. A shortcoming of these indexing schemes, which consider only the occurrences of the terms in a document, is that they have some limitations in filtering out noise in most cases. In this paper, we propose a novel weighting approach using fusion technique that can be combined with wavelet-based estimation to achieve consistent improvements in the clustering. Our approach involves three steps: (1) term frequency (TF) weighting scheme, (2) multiple wavelets estimating, and (3) data fusion. Specifically, we apply the wavelet with different scales to produce different estimation values of the original TF, and use the fusion of these different values as new features for clustering the documents. The conducted experiments of clustering the documents from RETURES corpus verify that our weighting schemes using wavelet and fusion techniques reduces effectively the noise and improves clustering performance evaluated using the entropy and F_measure.
Keywords
document image processing; image fusion; text analysis; wavelet transforms; data fusion; enhanced document clustering; multiple wavelets estimation; multiscale wavelet decomposition fusion; noise filtering; term frequency methods; text document clustering; wavelet-based estimation; weighting approach; Entropy; Explosions; Filtering; Frequency estimation; Indexing; Internet; Noise reduction; Subspace constraints; Wavelet analysis; Wavelet transforms;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Systems and Applications, 2008. AICCSA 2008. IEEE/ACS International Conference on
Conference_Location
Doha
Print_ISBN
978-1-4244-1967-8
Electronic_ISBN
978-1-4244-1968-5
Type
conf
DOI
10.1109/AICCSA.2008.4493632
Filename
4493632
Link To Document