Title :
A fuzzy-based algorithm for Web document clustering
Author :
Friedman, Menahem ; Kandel, Abraham ; Schneider, Moti ; Last, Mark ; Shapira, B. ; Elovici, Yuval ; Zaafrany, Omer
Author_Institution :
Dept. of Phys., Nucl. Res. Center-Negev, Beer-Sheva, Israel
Abstract :
Most existing methods of document clustering are based on a model that assumes a fixed-size vector representation of key terms or key phrases within each document. This assumption is not realistic in large and diverse document collections such as the World Wide Web. We propose a new fuzzy-based document clustering method (FDCM), to cluster documents that are represented by variable length vectors. Each vector element consists of two fields. The first is an identification of a key phrase (its name) in the document and the second denotes a frequency associated with this key phrase within the particular document. A new averaging method is defined for the cluster centroid calculating, and a membership function is developed for relating new documents to existing clusters. The proposed approach is described in detail and we show how it is implemented in a real world application from the area of Web monitoring.
Keywords :
Internet; document handling; fuzzy logic; pattern clustering; Web document clustering; Web monitoring; World Wide Web; fuzzy-based algorithm; vector representation; Clustering algorithms; Clustering methods; Computer science; Educational institutions; Electronic mail; Frequency; Information systems; Monitoring; Systems engineering and theory; Web sites;
Conference_Titel :
Fuzzy Information, 2004. Processing NAFIPS '04. IEEE Annual Meeting of the
Print_ISBN :
0-7803-8376-1
DOI :
10.1109/NAFIPS.2004.1337355