Title :
Active learning for text classification: Using the LSI Subspace Signature Model
Author :
Weizhong Zhu ; Allen, Robert B.
Author_Institution :
Dept. of Inf. Sci., City of Hope Med. Center, Los Angeles, CA, USA
Abstract :
Supervised learning methods rely on large sets of labeled training examples. However, large training sets are rare and making them is expensive. In this research, Latent Semantic Indexing Subspace Signature Model (LSISSM) is applied to labeling for active learning of unstructured text. Based on Singular Value Decomposition (SVD), LSISSM represents terms and documents as semantic signatures by the distribution of their local statistical contribution across the top-ranking LSI latent dimensions after dimension reduction. When utilized to an unlabeled text corpus, LSISSM finds the most important samples and terms according to their global statistical contribution ranking in the corresponding LSI subspaces without prior knowledge of labels or dependency to model-loss functions of the classifiers. These sample subsets also effectively maintain the sampling distribution of the whole corpus. Furthermore, tests demonstrate that the sample subsets with the optimized term subsets substantially improve the learning accuracy across three standard classifiers.
Keywords :
indexing; learning (artificial intelligence); pattern classification; sampling methods; singular value decomposition; text analysis; LSI subspace signature model; LSISSM; SVD; active learning; dimension reduction; latent semantic indexing subspace signature model; learning accuracy; loss function; sampling distribution; semantic signature; singular value decomposition; statistical contribution ranking; supervised learning method; text classification; top-ranking LSI latent dimension; unlabeled text corpus; unstructured text; Accuracy; Classification algorithms; Large scale integration; Matrix decomposition; Semantics; Text categorization; Training; Latent Semantic Indexing Subspace Signature Model; active learning; classifiers; text categorization;
Conference_Titel :
Data Science and Advanced Analytics (DSAA), 2014 International Conference on
DOI :
10.1109/DSAA.2014.7058066