Text categorization based on. Concept indexing and principal component analysis

Author

Ke, Huang ; Ma Shaoping

Author_Institution

Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China

Volume

1

fYear

2002

fDate

28-31 Oct. 2002

Firstpage

51

Abstract

A major problem in text categorization is the high dimensionality of feature vector space, which is about ten thousands in common. To reduce the dimensionality of the space while keeping the categorization accuracy is useful for improving categorization effectiveness and applying new categorization algorithms. Current feature selection methods for text categorization are partially effective in reducing dimensionality. We put forward a new algorithm, which combines algorithm of concept indexing and principal component analysis, for reducing dimensionality. From the experiments, we find that this algorithm can effectively reduce dimensionality without sacrificing categorization accuracy.

Keywords

database indexing; information retrieval; principal component analysis; categorization effectiveness; concept indexing; feature selection methods; feature vector space; principal component analysis; text categorization; Classification tree analysis; Computer science; Indexing; Intelligent systems; Internet; Principal component analysis; Prototypes; Space technology; Testing; Text categorization;

fLanguage

English

Publisher

ieee

Conference_Titel

TENCON '02. Proceedings. 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering

Print_ISBN

0-7803-7490-8

Type

conf

DOI

10.1109/TENCON.2002.1181212

Filename

1181212