Title :
Using Clustering and Co-Training to Boost Classification Performance
Author :
Kyriakopoulou, Antonia
Author_Institution :
Athens Univ. of Econ. & Bus., Athens
Abstract :
This paper shows that the performance of a linear SVM classifier can be improved by utilizing meta-information derived from clustering. Clustering aims in discovering extra knowledge concerning the structure of the whole dataset, (both training and testing set). A co-training algorithm is introduced that uses clustering as a complementary step to text classification. At each iteration step of the algorithm the clustering phase augments the feature space with a new meta-feature that for each document reflects cluster membership and the classification phase introduces another meta-feature that indicates class membership. Experimental results obtained using widely used datasets demonstrate the effectiveness of the proposed approaches especially for small training sets.
Keywords :
data mining; pattern classification; pattern clustering; support vector machines; text analysis; classification performance boosting; clustering; co-training algorithm; extra knowledge discovery; linear SVM classifier; meta-feature; meta-information; small training sets; testing set; text classification; widely used datasets; Artificial intelligence; Clustering algorithms; Informatics; Labeling; Machine learning; Support vector machine classification; Support vector machines; Testing; Text categorization; Training data;
Conference_Titel :
Tools with Artificial Intelligence, 2007. ICTAI 2007. 19th IEEE International Conference on
Conference_Location :
Patras
Print_ISBN :
978-0-7695-3015-4
DOI :
10.1109/ICTAI.2007.146