DocumentCode :
3251501
Title :
Evaluating the utility of statistical phrases and latent semantic indexing for text classification
Author :
Wu, Huiwen ; Gunopulos, Dimitrios
Author_Institution :
Comput. Sci. & Eng. Dept., California Univ., Riverside, CA, USA
fYear :
2002
fDate :
2002
Firstpage :
713
Lastpage :
716
Abstract :
The term-based vector space model is a prominent technique for retrieving textual information. In this paper we examine the usefulness of phrases as terms in vector-based document classification. We focus on statistical techniques to extract both adjacent and window phrases from documents. We discover that the positive effect of adding phrase terms is very limited, if we have already achieved good performance using single-word terms, even when SVD/LSI is used as the dimensionality reduction method.
Keywords :
classification; indexing; information retrieval; statistical analysis; text analysis; adjacent phrase extraction; dimensionality reduction method; latent semantic indexing; single-word terms; statistical phrases; term-based vector space model; text classification; textual information retrieval; vector-based document classification; window phrase extraction; Computer science; Data mining; Dictionaries; Frequency; Indexing; Information retrieval; Large scale integration; Support vector machine classification; Support vector machines; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
Type :
conf
DOI :
10.1109/ICDM.2002.1184036
Filename :
1184036
Link To Document :
بازگشت