• DocumentCode
    3756174
  • Title

    Extracting Rule RF in Educational Data Classification: From a Random Forest to Interpretable Refined Rules

  • Author

    Lu Thi Kim Phung;Vo Thi Ngoc Chau;Nguyen Hua Phung

  • Author_Institution
    Fac. of Comput. Sci. &
  • fYear
    2015
  • Firstpage
    20
  • Lastpage
    27
  • Abstract
    To early detect in-trouble students in an academic credit system has been emerging in the educational data mining research arena. This problem has been taken into consideration with a multi-class educational data classification task. Although many existing supervised learning algorithms are available and able to provide us with many acceptable classification models, the interpretability of these models needs to be investigated so that they can be applied in practice. On the other hand, random forests have been examined and appeared to be an appropriate solution to effectively classify the students for early in-trouble student detection in a credit system. However, random forests are black-box ensemble models which lack a capability of explanation for the reasoning behind their prediction. Therefore, in this paper, we define a rule extraction algorithm named ExtractingRuleRF to derive an interpretable refined classification rule set from a random forest for a multi-class data classification task. The proposed algorithm follows a greedy approach with two phases: rule refinement and rule extraction. In the first phase, we prepare a ranked weighted rule set with more interpretability and equivalent classification power of the input random forest by retaining its classification scheme. In the second phase, our rule extraction process returns the best rules for the highest accuracy and/or a full coverage based on the priority of each ranked rule. Consequently, the theoretical analysis of the algorithm and experimental results on real educational data sets have shown that ExtractingRuleRF can produce a more effective and interpretable rule-based classification model than its corresponding random forest. Such a result helps our knowledge-based educational decision support with interpretable classification rules to be more practical.
  • Keywords
    "Classification algorithms","Data mining","Data models","Vegetation","Decision trees","Cities and towns","Prediction algorithms"
  • Publisher
    ieee
  • Conference_Titel
    Advanced Computing and Applications (ACOMP), 2015 International Conference on
  • Type

    conf

  • DOI
    10.1109/ACOMP.2015.13
  • Filename
    7422370