• DocumentCode
    390583
  • Title

    Text categorization based on. Concept indexing and principal component analysis

  • Author

    Ke, Huang ; Ma Shaoping

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • Volume
    1
  • fYear
    2002
  • fDate
    28-31 Oct. 2002
  • Firstpage
    51
  • Abstract
    A major problem in text categorization is the high dimensionality of feature vector space, which is about ten thousands in common. To reduce the dimensionality of the space while keeping the categorization accuracy is useful for improving categorization effectiveness and applying new categorization algorithms. Current feature selection methods for text categorization are partially effective in reducing dimensionality. We put forward a new algorithm, which combines algorithm of concept indexing and principal component analysis, for reducing dimensionality. From the experiments, we find that this algorithm can effectively reduce dimensionality without sacrificing categorization accuracy.
  • Keywords
    database indexing; information retrieval; principal component analysis; categorization effectiveness; concept indexing; feature selection methods; feature vector space; principal component analysis; text categorization; Classification tree analysis; Computer science; Indexing; Intelligent systems; Internet; Principal component analysis; Prototypes; Space technology; Testing; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    TENCON '02. Proceedings. 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering
  • Print_ISBN
    0-7803-7490-8
  • Type

    conf

  • DOI
    10.1109/TENCON.2002.1181212
  • Filename
    1181212