DocumentCode :
3724164
Title :
GS-Orthogonalization Based "Basis Feature" Selection from Word Co-occurrence Matrix
Author :
Deqing Wang;Hui Zhang;Rui Liu
Author_Institution :
Sch. of Comput. Sci., Beihang Univ., Beijing, China
fYear :
2015
Firstpage :
1027
Lastpage :
1032
Abstract :
Feature selection plays an important role in machinelearning applications. Especially for text data, the highdimensionaland sparse characteristics will affect the performanceof feature selction. In this paper, an unsupervised feature selection algorithm through Random Projection and Gram-Schmidt Orthogonalization (RP-GSO) from the word co-occurrence matrix is proposed. The RP-GSO has three advantages: (1) it takes as input dense word co-occurrence matrix, avoiding the sparseness of original document-term matrix, (2) it selects "basis features" by Gram-Schmidt process, guaranteeing the orthogonalization of feature space, and (3) it adopts random projection to speed upGS process. We did extensive experiments on two real-world textcorpora, and observed that RP-GSO achieves better performancecomparing against supervised and unsupervised methods in textclassification and clustering tasks.
Keywords :
"Sparse matrices","Feature extraction","Training","Clustering algorithms","MATLAB","Computer science","Matrix decomposition"
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2015 IEEE International Conference on
ISSN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2015.80
Filename :
7373430
Link To Document :
بازگشت