• DocumentCode
    2851576
  • Title

    Supervised latent semantic indexing for document categorization

  • Author

    Sun, Jian-Tao ; Chen, Zheng ; Zeng, Hua-Jun ; Lu, Yu-Chang ; Shi, Chun-yi ; Ma, Wei-Ying

  • Author_Institution
    Dept. of Comput. Sci., TsingHua Univ., Beijing, China
  • fYear
    2004
  • fDate
    1-4 Nov. 2004
  • Firstpage
    535
  • Lastpage
    538
  • Abstract
    Latent semantic indexing (LSI) is a successful technology in information retrieval (IR) which attempts to explore the latent semantics implied by a query or a document through representing them in a dimension-reduced space. However, LSI is not optimal for document categorization tasks because it aims to find the most representative features for document representation rather than the most discriminative ones. In this paper, we propose supervised LSI (SLSI) which selects the most discriminative basis vectors using the training data iteratively. The extracted vectors are then used to project the documents into a reduced dimensional space for better classification. Experimental evaluations show that the SLSI approach leads to dramatic dimension reduction while achieving good classification results.
  • Keywords
    document handling; indexing; dimension-reduced space; discriminative basis vectors; document categorization; document representation; information retrieval; supervised latent semantic indexing; Asia; Computer science; Data mining; Indexing; Information retrieval; Large scale integration; Singular value decomposition; Space technology; Sun; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2004. ICDM '04. Fourth IEEE International Conference on
  • Print_ISBN
    0-7695-2142-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2004.10004
  • Filename
    1410354