• DocumentCode
    3101629
  • Title

    Approaches of Dimensionality Reduction for Telugu Document Classification

  • Author

    Reddy, Vijayapal P. ; Sasidhar, B. ; Reddy, Harinatha B. ; Vardhan, Vishnu B. ; Reddy, Pratap L. ; Govardhan, A.

  • Author_Institution
    Dept. of CSE, Rajamahendra Coll. of Eng., Ibrahimpatnam, India
  • fYear
    2009
  • fDate
    7-9 Dec. 2009
  • Firstpage
    259
  • Lastpage
    264
  • Abstract
    Document classification is one of the prominent area of research evolved as a result of exponential growth in the usage of electronic documents. Classification of documents demands for understanding of document units by removing insignificant data and improving computational efficiency. This paper deals with the approaches aimed at dimensionality reduction (DR) in document units for Telugu. Bag of words is a generic model for English document classification, adaptation of this model on Indic based scripts found to have a meager performance. Two approaches are presented in this paper, first approach deals with language specific and corpus based dimensionality reduction termed as validity based DR. The other approach is category and document specific approach termed as category based DR. The performance of the two approaches is evaluated with the help of accuracy as a measure.
  • Keywords
    document handling; pattern classification; English document classification; Indic based scripts; Telugu document classification; category specific approach; corpus based dimensionality reduction; document specific approach; electronic documents; Adaptation model; Computational efficiency; Information retrieval; Knowledge engineering; Labeling; Machine learning; Machine learning algorithms; Natural languages; Text categorization; Training data; Classification; Dimensionality Reduction; Indic Scripts;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing, 2009. IALP '09. International Conference on
  • Conference_Location
    Singapore
  • Print_ISBN
    978-0-7695-3904-1
  • Type

    conf

  • DOI
    10.1109/IALP.2009.82
  • Filename
    5380745