DocumentCode
162599
Title
Indian Language Text Representation and Categorization Using Supervised Learning Algorithm
Author
Swamy, M. Narayana ; Hanumanthappa, M. ; Jyothi, N.M.
Author_Institution
Dept. of Comput. Applic., Presidency Coll., Bangalore, India
fYear
2014
fDate
6-7 March 2014
Firstpage
406
Lastpage
410
Abstract
In the Constitution of India, a provision is made for each of the Indian states to choose their own official language for communicating at the state level for official purpose. The availability of constantly increasing amount of textual data of various Indian regional languages in electronic form has accelerated. So the Classification of text documents based on languages is essential. The objective of the work is the representation and categorization of Indian language text documents using text mining techniques. Several text mining techniques such as naive Bayes classifier, k-Nearest-Neighbor classifier and decision tree for text categorization have been used.
Keywords
data mining; decision trees; learning (artificial intelligence); natural language processing; pattern classification; text analysis; Indian language text categorization; Indian language text document categorization; Indian language text document representation; Indian language text representation; Indian regional languages; decision tree; k-nearest-neighbor classifier; naive Bayes classifier; supervised learning algorithm; text document classification; text mining techniques; Classification algorithms; Decision trees; Educational institutions; Support vector machine classification; Text categorization; Text mining; Vectors; Bayes classifier; Decision tree; F-measure; Lemmatization or Stemming; Stop words; Tokens; Vector Space Model; Zipf´s law; k-Neighbor classifier; precision (p); recall (r);
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Computing Applications (ICICA), 2014 International Conference on
Conference_Location
Coimbatore
Type
conf
DOI
10.1109/ICICA.2014.89
Filename
6965081
Link To Document