Title :
Automatic classification of academic documents using text mining techniques
Author :
Nunez, Haydemar ; Ramos, Edgar
Author_Institution :
Lab. de Intel. Artificial. Escuela de Comput., Univ. Central de Venezuela, Caracas, Venezuela
Abstract :
In this work an automatic classifier of undergraduate final projects based on text mining is presented. The dataset, comprising documents from four professional categories, was represented by means the vector space model with different index metrics. Also, a number of techniques for reduction dimensionality were applied over the word space. In order to construct the classification model the K-nearest neighbor algorithm was applied. Using 10-fold cross-validations we could obtain 82% of predictive accuracy. However, we achieved an accuracy of 95% with a recommendation of up to two categories taking into account the interdisciplinary in documents. This classifier was integrated into an application for automatic assignment of reviewers, which performs this assignation from teachers who belong to the areas recommended.
Keywords :
data mining; data reduction; educational administrative data processing; further education; pattern classification; text analysis; 10-fold cross-validations; K-nearest neighbor algorithm; automatic academic document classification; automatic classifier; automatic reviewer assignment; dimensionality reduction techniques; index metrics; predictive accuracy; professional categories; text mining techniques; undergraduate final projects; vector space model; word space; Accuracy; Chebyshev approximation; Classification algorithms; Computational modeling; Laboratories; Text mining; Vectors; K nearest neighbor algorithm; Text mining; classification models; documents categorization;
Conference_Titel :
Informatica (CLEI), 2012 XXXVIII Conferencia Latinoamericana En
Conference_Location :
Medellin
Print_ISBN :
978-1-4673-0794-9
DOI :
10.1109/CLEI.2012.6427167