DocumentCode :
162599
Title :
Indian Language Text Representation and Categorization Using Supervised Learning Algorithm
Author :
Swamy, M. Narayana ; Hanumanthappa, M. ; Jyothi, N.M.
Author_Institution :
Dept. of Comput. Applic., Presidency Coll., Bangalore, India
fYear :
2014
fDate :
6-7 March 2014
Firstpage :
406
Lastpage :
410
Abstract :
In the Constitution of India, a provision is made for each of the Indian states to choose their own official language for communicating at the state level for official purpose. The availability of constantly increasing amount of textual data of various Indian regional languages in electronic form has accelerated. So the Classification of text documents based on languages is essential. The objective of the work is the representation and categorization of Indian language text documents using text mining techniques. Several text mining techniques such as naive Bayes classifier, k-Nearest-Neighbor classifier and decision tree for text categorization have been used.
Keywords :
data mining; decision trees; learning (artificial intelligence); natural language processing; pattern classification; text analysis; Indian language text categorization; Indian language text document categorization; Indian language text document representation; Indian language text representation; Indian regional languages; decision tree; k-nearest-neighbor classifier; naive Bayes classifier; supervised learning algorithm; text document classification; text mining techniques; Classification algorithms; Decision trees; Educational institutions; Support vector machine classification; Text categorization; Text mining; Vectors; Bayes classifier; Decision tree; F-measure; Lemmatization or Stemming; Stop words; Tokens; Vector Space Model; Zipf´s law; k-Neighbor classifier; precision (p); recall (r);
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Computing Applications (ICICA), 2014 International Conference on
Conference_Location :
Coimbatore
Type :
conf
DOI :
10.1109/ICICA.2014.89
Filename :
6965081
Link To Document :
بازگشت