• DocumentCode
    2430901
  • Title

    Automated text content identification for document processing using a kernel-based support Vector Selection approach

  • Author

    Benveniste, Steven M. ; Fargues, Monique P.

  • Author_Institution
    ECE Dept., Naval Postgrad. Sch., Monterey, CA, USA
  • fYear
    2009
  • fDate
    1-4 Nov. 2009
  • Firstpage
    366
  • Lastpage
    370
  • Abstract
    Automated text analysis and mining tools designed to identify the main topics of texts, chat room discussions, and web postings are an increasingly active research area due to the rapid explosion of Web information. This paper applies the nonlinear kernel-based Feature Vector Selection (FVS) approach followed by a Linear Discriminant Analysis (LDA) step to categorize unstructured text documents. Results are compared to those obtained using the Latent Semantic Analysis (LSA) approach commonly used in text categorization applications. Overall results, taking into account classification performances and computational load issues, show that the FVS-LDA implemented with a polynomial kernel of degree 1 and an added constant of 1 to be the best classifier for the database considered.
  • Keywords
    data mining; pattern classification; support vector machines; text analysis; FVS-LDA approach; document processing; kernel-based feature vector selection; kernel-based support vector selection; latent semantic analysis; polynomial kernel; text categorization; text content identification; text mining; Adaptive filters; Computational efficiency; Data mining; Databases; Explosions; Information retrieval; Kernel; Linear discriminant analysis; Machine learning; Text categorization; Data Mining; Feature Vector Selection (FVS); Kernel Based Schemes; Single Value Decomposition (SVD); Text Categorization; Text Classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signals, Systems and Computers, 2009 Conference Record of the Forty-Third Asilomar Conference on
  • Conference_Location
    Pacific Grove, CA
  • ISSN
    1058-6393
  • Print_ISBN
    978-1-4244-5825-7
  • Type

    conf

  • DOI
    10.1109/ACSSC.2009.5469831
  • Filename
    5469831