DocumentCode :
3301048
Title :
Enhanced document clustering using fusion of multiscale wavelet decomposition
Author :
Hussin, Mahmoud F. ; El Rube, Ibrahim ; Kamel, Mohamed S.
Author_Institution :
Arab Acad. for Sci. & Technol. & Maritime Transp., Alexandria
fYear :
2008
fDate :
March 31 2008-April 4 2008
Firstpage :
870
Lastpage :
874
Abstract :
Most term weighting schemes for text document clustering depend on the term frequency based analysis of the text contents. A shortcoming of these indexing schemes, which consider only the occurrences of the terms in a document, is that they have some limitations in filtering out noise in most cases. In this paper, we propose a novel weighting approach using fusion technique that can be combined with wavelet-based estimation to achieve consistent improvements in the clustering. Our approach involves three steps: (1) term frequency (TF) weighting scheme, (2) multiple wavelets estimating, and (3) data fusion. Specifically, we apply the wavelet with different scales to produce different estimation values of the original TF, and use the fusion of these different values as new features for clustering the documents. The conducted experiments of clustering the documents from RETURES corpus verify that our weighting schemes using wavelet and fusion techniques reduces effectively the noise and improves clustering performance evaluated using the entropy and F_measure.
Keywords :
document image processing; image fusion; text analysis; wavelet transforms; data fusion; enhanced document clustering; multiple wavelets estimation; multiscale wavelet decomposition fusion; noise filtering; term frequency methods; text document clustering; wavelet-based estimation; weighting approach; Entropy; Explosions; Filtering; Frequency estimation; Indexing; Internet; Noise reduction; Subspace constraints; Wavelet analysis; Wavelet transforms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Systems and Applications, 2008. AICCSA 2008. IEEE/ACS International Conference on
Conference_Location :
Doha
Print_ISBN :
978-1-4244-1967-8
Electronic_ISBN :
978-1-4244-1968-5
Type :
conf
DOI :
10.1109/AICCSA.2008.4493632
Filename :
4493632
Link To Document :
بازگشت