DocumentCode
3101629
Title
Approaches of Dimensionality Reduction for Telugu Document Classification
Author
Reddy, Vijayapal P. ; Sasidhar, B. ; Reddy, Harinatha B. ; Vardhan, Vishnu B. ; Reddy, Pratap L. ; Govardhan, A.
Author_Institution
Dept. of CSE, Rajamahendra Coll. of Eng., Ibrahimpatnam, India
fYear
2009
fDate
7-9 Dec. 2009
Firstpage
259
Lastpage
264
Abstract
Document classification is one of the prominent area of research evolved as a result of exponential growth in the usage of electronic documents. Classification of documents demands for understanding of document units by removing insignificant data and improving computational efficiency. This paper deals with the approaches aimed at dimensionality reduction (DR) in document units for Telugu. Bag of words is a generic model for English document classification, adaptation of this model on Indic based scripts found to have a meager performance. Two approaches are presented in this paper, first approach deals with language specific and corpus based dimensionality reduction termed as validity based DR. The other approach is category and document specific approach termed as category based DR. The performance of the two approaches is evaluated with the help of accuracy as a measure.
Keywords
document handling; pattern classification; English document classification; Indic based scripts; Telugu document classification; category specific approach; corpus based dimensionality reduction; document specific approach; electronic documents; Adaptation model; Computational efficiency; Information retrieval; Knowledge engineering; Labeling; Machine learning; Machine learning algorithms; Natural languages; Text categorization; Training data; Classification; Dimensionality Reduction; Indic Scripts;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing, 2009. IALP '09. International Conference on
Conference_Location
Singapore
Print_ISBN
978-0-7695-3904-1
Type
conf
DOI
10.1109/IALP.2009.82
Filename
5380745
Link To Document