DocumentCode :
3508767
Title :
Word-Text Matrix Feature Selection in Chinese Text Classfication Based on LSI
Author :
Gu, Yijun ; Wang, Rong ; Wang, Jianhua
Author_Institution :
Coll. of Inf. Security & Eng., Chinese People´´s Public Security Univ., Beijing
Volume :
3
fYear :
2009
fDate :
7-8 March 2009
Firstpage :
808
Lastpage :
811
Abstract :
LSI can be regarded as a mapping of vector space model. Through carrying singular value decomposition computation on the word-text matrix in original text sets, the relationship among the latent connotation concepts in the documents sets can be calculated. Expressing all the concepts space by latent concept sets reduces the fuzziness among the concept expression and avoids the supposition that concept is orthogonal among each dimensionality in VSM. This paper studies the effect to text classification of Chinese word based on LSI after selecting four feature selection methods (Information Gain, Cross Entropy, Odds ratio, Union Odds Ratio, respectively) to reduce the number of dimensionalities of word-document matrix. The experimental results show that using Union Odds Ratio to reduce the number of the dimensionalities of word-text matrix can classify better than using the others in text classification based on LSI.
Keywords :
classification; matrix algebra; natural language processing; singular value decomposition; text analysis; Chinese text classfication; LSI; cross entropy; information gain; latent semantic index; singular value decomposition; union odds ratio; vector space model; word-text matrix feature selection; Computer science; Computer science education; Educational institutions; Educational technology; Information security; Large scale integration; Matrix decomposition; Singular value decomposition; Statistics; Text categorization; Feature selection; LSI; Text Classfication;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Education Technology and Computer Science, 2009. ETCS '09. First International Workshop on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-1-4244-3581-4
Type :
conf
DOI :
10.1109/ETCS.2009.716
Filename :
4959433
Link To Document :
بازگشت