• Title of article

    A novel approach to estimate proximity in a random forest: An exploratory study

  • Author/Authors

    Englund، نويسنده , , C. and Verikas، نويسنده , , A.، نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 2012
  • Pages
    5
  • From page
    13046
  • To page
    13050
  • Abstract
    A data proximity matrix is an important information source in random forests (RF) based data mining, including data clustering, visualization, outlier detection, substitution of missing values, and finding mislabeled data samples. A novel approach to estimate proximity is proposed in this work. The approach is based on measuring distance between two terminal nodes in a decision tree. To assess the consistency (quality) of data proximity estimate, we suggest using the proximity matrix as a kernel matrix in a support vector machine (SVM), under the assumption that a matrix of higher quality leads to higher classification accuracy. It is experimentally shown that the proposed approach improves the proximity estimate, especially when RF is made of a small number of trees. It is also demonstrated that, for some tasks, an SVM exploiting the suggested proximity matrix based kernel, outperforms an SVM based on a standard radial basis function kernel and the standard proximity matrix based kernel.
  • Keywords
    Random forest , Proximity matrix , Kernel matrix , Support vector machine , DATA MINING
  • Journal title
    Expert Systems with Applications
  • Serial Year
    2012
  • Journal title
    Expert Systems with Applications
  • Record number

    2352768