DocumentCode :
1928561
Title :
Competitive learning mechanisms for scalable, incremental and balanced clustering of streaming texts
Author :
Banerjee, Arindam ; Ghosh, Joydeep
Author_Institution :
Dept. of Electr. & Comput. Eng., Texas Univ., Austin, TX, USA
Volume :
4
fYear :
2003
fDate :
20-24 July 2003
Firstpage :
2697
Abstract :
Automated clustering of text documents such as Web pages is becoming increasingly important for organizing the vast amounts of information available over the Internet. This problem is also very challenging since typically text is represented by very high dimensional (> 1000), normalized (unit length) vectors. Moreover documents are continually being created and their statistics also change with time because of changing new-stories etc, so one needs incremental learning algorithms that can adapt to non-stationary environments. We model high-dimensional, normalized data using a mixture of von Mises-Fisher distributions, and then modify this generative model in a principled way to yield frequency sensitive competitive learning mechanisms that are applicable to streaming data, and produce balanced clusters. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques.
Keywords :
Internet; pattern clustering; text analysis; unsupervised learning; Internet; Web pages; automated clustering text documents; balanced clustering; competitive learning mechanisms; documents; frequency sensitive competitive learning mechanisms; high-dimensional normalized data; incremental clustering; nonstationary environments; scalable clustering; statistics; von Mises-Fisher distributions; Clustering algorithms; Frequency; Internet; Learning systems; Navigation; Organizing; Sparse matrices; Vocabulary; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2003. Proceedings of the International Joint Conference on
ISSN :
1098-7576
Print_ISBN :
0-7803-7898-9
Type :
conf
DOI :
10.1109/IJCNN.2003.1223993
Filename :
1223993
Link To Document :
بازگشت