Title :
Classification of internet newsgroup articles using RACE
Author :
Runkler, Thomas A. ; Bezdek, James C.
Author_Institution :
Corp. Technol., Siemens AG, Munich, Germany
Abstract :
Conventional point prototype clustering like fuzzy c-means is used to extract prototypes and partitions from numerical data. Clustering is often done by an alternating cluster estimation (ACE) algorithm that may either be specified by an objective function or by user-defined membership and prototype functions. Also non-numerical data like text data can often be represented numerically by (pairwise) relation matrices. Clusters in these relational data can be found by relational alternating cluster estimation (RACE). For text data with Levenshtein distances the RACE cluster centers can be used as keywords. We apply RACE to extract keywords from the articles of internet newgroups and use these keywords to build a classifier that automatically assigns (previously unknown) articles to the most appropriate newgroup
Keywords :
Internet; classification; information resources; relational databases; Internet newsgroup articles classification; Levenshtein distances; alternating cluster estimation; fuzzy c-means; keywords; objective function; prototype clustering; relational alternating cluster estimation; relational data; text data; Clustering algorithms; Communications technology; Computer science; Constraint optimization; Data mining; Information analysis; Internet; Partitioning algorithms; Prototypes; Virtual colonoscopy;
Conference_Titel :
IFSA World Congress and 20th NAFIPS International Conference, 2001. Joint 9th
Conference_Location :
Vancouver, BC
Print_ISBN :
0-7803-7078-3
DOI :
10.1109/NAFIPS.2001.943760