Title :
Label-based semi-supervised fuzzy co-clustering for document categoraization
Author :
Yan, Yang ; Chen, Lihui
Author_Institution :
Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore, Singapore
Abstract :
Semi-supervised clustering uses a small amount of labeled data to aid and bias the clustering of unlabeled data. In this paper the use of labeled data at the initial state, as well as the use of the constraints generated from the labels during the clustering process is explored. We formulate the clustering process as a constrained optimization problem, and propose a novel semi-supervised fuzzy co-clustering algorithm which incorporated with a few category labels to handle large overlapping text corpus. Simulations on a few large benchmark datasets demonstrate the strength and potentials of this new approach in terms of accuracy, stability and efficiency with limited labels, compared with some existing label-based semi-supervised clustering algorithms.
Keywords :
data analysis; fuzzy set theory; learning (artificial intelligence); optimisation; pattern clustering; text analysis; constrained optimization problem; document categorization; label-based semi-supervised fuzzy co-clustering; labeled data; large overlapping text corpus; machine learning technique; unlabeled data; Accuracy; Algorithm design and analysis; Benchmark testing; Clustering algorithms; Complexity theory; Optimization; Partitioning algorithms; class labels; document categorization; fuzzy co-clustering; semi-supervised clustering;
Conference_Titel :
Information, Communications and Signal Processing (ICICS) 2011 8th International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-1-4577-0029-3
DOI :
10.1109/ICICS.2011.6173605