• DocumentCode
    237261
  • Title

    A Mutual Information-Based Hybrid Feature Selection Method for Software Cost Estimation Using Feature Clustering

  • Author

    Qin Liu ; Shihai Shi ; Hongming Zhu ; Jiakai Xiao

  • Author_Institution
    Sch. of Software Eng., Tongji Univ., Shanghai, China
  • fYear
    2014
  • fDate
    21-25 July 2014
  • Firstpage
    27
  • Lastpage
    32
  • Abstract
    Feature selection methods are designed to obtain the optimal feature subset from the original features to give the most accurate prediction. So far, supervised and unsupervised feature selection methods have been discussed and developed separately. However, these two methods can be combined together as a hybrid feature selection method for some data sets. In this paper, we propose a mutual information-based (MI-based) hybrid feature selection method using feature clustering. In the unsupervised learning stage, the original features are grouped into several clusters based on the feature similarity to each other with agglomerative hierarchical clustering. Then in the supervised learning stage, the feature in each cluster that can maximize the feature similarity with the response feature which represents the class label is selected as the representative feature. These representative features compose the feature subset. Our contribution includes 1)the newly proposed feature selection method and 2)the application of feature clustering for software cost estimation. The proposed method employs wrapper approaches, so it can evaluate the prediction performance of each feature subset to determine the optimal one. The experimental results in software cost estimation demonstrate that the proposed method can outperform at least 11.5% and 14.8% than the supervised feature selection method INMIFS and mRMRFS in ISBSG R8 and Desharnais data set in terms of PRED (0.25) value.
  • Keywords
    feature selection; pattern clustering; software cost estimation; unsupervised learning; feature clustering; feature similarity; mutual information-based hybrid feature selection method; software cost estimation; unsupervised learning; Cognition; Entropy; Estimation; Mutual information; Random variables; Redundancy; Software; feature clustering; feature selection; mutual information; software cost estimation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Software and Applications Conference (COMPSAC), 2014 IEEE 38th Annual
  • Conference_Location
    Vasteras
  • Type

    conf

  • DOI
    10.1109/COMPSAC.2014.99
  • Filename
    6899197