DocumentCode :
2971269
Title :
TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams
Author :
Reed, Joel W. ; Jiao, Yu ; Potok, Thomas E. ; Klump, Brian A. ; Elmore, Mark T. ; Hurson, Ali R.
Author_Institution :
Appl. Software Eng. Res. Group, Oak Ridge Nat. Lab., TN
fYear :
2006
fDate :
Dec. 2006
Firstpage :
258
Lastpage :
263
Abstract :
In this paper, we propose a new term weighting scheme called term frequency-inverse corpus frequency (TF-ICF). It does not require term frequency information from other documents within the document collection and thus, it enables us to generate the document vectors of N streaming documents in linear time. In the context of a machine learning application, unsupervised document clustering, we evaluated the effectiveness of the proposed approach in comparison to five widely used term weighting schemes through extensive experimentation. Our results show that TF-ICF can produce document clusters that are of comparable quality as those generated by the widely recognized term weighting schemes and it is significantly faster than those methods
Keywords :
computational complexity; pattern clustering; text analysis; unsupervised learning; dynamic data stream clustering; machine learning application; term frequency-inverse corpus frequency; term weighting scheme; unsupervised document clustering; Computational complexity; Computer science; Data engineering; Frequency conversion; Information filtering; Laboratories; Machine learning; Parallel algorithms; Software engineering; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications, 2006. ICMLA '06. 5th International Conference on
Conference_Location :
Orlando, FL
Print_ISBN :
0-7695-2735-3
Type :
conf
DOI :
10.1109/ICMLA.2006.50
Filename :
4041501
Link To Document :
بازگشت