Title :
Extracting Rule RF in Educational Data Classification: From a Random Forest to Interpretable Refined Rules
Author :
Lu Thi Kim Phung;Vo Thi Ngoc Chau;Nguyen Hua Phung
Author_Institution :
Fac. of Comput. Sci. &
Abstract :
To early detect in-trouble students in an academic credit system has been emerging in the educational data mining research arena. This problem has been taken into consideration with a multi-class educational data classification task. Although many existing supervised learning algorithms are available and able to provide us with many acceptable classification models, the interpretability of these models needs to be investigated so that they can be applied in practice. On the other hand, random forests have been examined and appeared to be an appropriate solution to effectively classify the students for early in-trouble student detection in a credit system. However, random forests are black-box ensemble models which lack a capability of explanation for the reasoning behind their prediction. Therefore, in this paper, we define a rule extraction algorithm named ExtractingRuleRF to derive an interpretable refined classification rule set from a random forest for a multi-class data classification task. The proposed algorithm follows a greedy approach with two phases: rule refinement and rule extraction. In the first phase, we prepare a ranked weighted rule set with more interpretability and equivalent classification power of the input random forest by retaining its classification scheme. In the second phase, our rule extraction process returns the best rules for the highest accuracy and/or a full coverage based on the priority of each ranked rule. Consequently, the theoretical analysis of the algorithm and experimental results on real educational data sets have shown that ExtractingRuleRF can produce a more effective and interpretable rule-based classification model than its corresponding random forest. Such a result helps our knowledge-based educational decision support with interpretable classification rules to be more practical.
Keywords :
"Classification algorithms","Data mining","Data models","Vegetation","Decision trees","Cities and towns","Prediction algorithms"
Conference_Titel :
Advanced Computing and Applications (ACOMP), 2015 International Conference on
DOI :
10.1109/ACOMP.2015.13