• DocumentCode
    162599
  • Title

    Indian Language Text Representation and Categorization Using Supervised Learning Algorithm

  • Author

    Swamy, M. Narayana ; Hanumanthappa, M. ; Jyothi, N.M.

  • Author_Institution
    Dept. of Comput. Applic., Presidency Coll., Bangalore, India
  • fYear
    2014
  • fDate
    6-7 March 2014
  • Firstpage
    406
  • Lastpage
    410
  • Abstract
    In the Constitution of India, a provision is made for each of the Indian states to choose their own official language for communicating at the state level for official purpose. The availability of constantly increasing amount of textual data of various Indian regional languages in electronic form has accelerated. So the Classification of text documents based on languages is essential. The objective of the work is the representation and categorization of Indian language text documents using text mining techniques. Several text mining techniques such as naive Bayes classifier, k-Nearest-Neighbor classifier and decision tree for text categorization have been used.
  • Keywords
    data mining; decision trees; learning (artificial intelligence); natural language processing; pattern classification; text analysis; Indian language text categorization; Indian language text document categorization; Indian language text document representation; Indian language text representation; Indian regional languages; decision tree; k-nearest-neighbor classifier; naive Bayes classifier; supervised learning algorithm; text document classification; text mining techniques; Classification algorithms; Decision trees; Educational institutions; Support vector machine classification; Text categorization; Text mining; Vectors; Bayes classifier; Decision tree; F-measure; Lemmatization or Stemming; Stop words; Tokens; Vector Space Model; Zipf´s law; k-Neighbor classifier; precision (p); recall (r);
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Computing Applications (ICICA), 2014 International Conference on
  • Conference_Location
    Coimbatore
  • Type

    conf

  • DOI
    10.1109/ICICA.2014.89
  • Filename
    6965081