Title :
Multi-class multi-tag classifier system for StackOverflow questions
Author :
Jos? R. Cede?o Gonz?lez;Juan J. Flores Romero;Mario Graff Guerrero;Felix Calder?n
Author_Institution :
Departamento de Estudios de Posgrado de la Facultad de Ingenier?a El?ctrica, Universidad Michoacana de San Nicol?s de Hidalgo, Mexico
Abstract :
This work approaches the text document classification problem derived from the contest “Identify Keywords and Tags from Millions of Text Questions”, published on the website Kaggle. Using data from the StackOverflow website, the problem is to predict the tags assigned to questions. This categorization is multi-class and multi-tag, which means, a question can be assigned to different topics and can also have several tags. To solve this problem, we propose a 5-way multi-class classifier system. The results obtained by this classification scheme are discussed, by analysing certain score metrics of the classifier system. Competitive results were obtained by the 5-way classifier system, obtaining F1 scores ranging from 0.59 to 0.76. The main contribution of this paper lies on the preprocessing (which implements the feature extraction phase) and the multi-tag multi-class classification scheme.
Keywords :
"Training","Feature extraction","Support vector machines","Classification algorithms","Prediction algorithms","Vocabulary","Web sites"
Conference_Titel :
Power, Electronics and Computing (ROPEC), 2015 IEEE International Autumn Meeting on
DOI :
10.1109/ROPEC.2015.7395121