• DocumentCode
    3688449
  • Title

    Analysis of feature selection techniques in credit risk assessment

  • Author

    R. S. Ramya;S. Kumaresan

  • Author_Institution
    Department of CSE, Government college of Technology, Coimbatore
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Data Mining is an automated extraction of hidden knowledge from large amount of data. The computational complexity of the data mining algorithms increases rapidly as the number of features in the dataset increases. Real world credit datasets have accumulated large quantities of information about clients and their financial and payment history. Feature selection techniques are used on such high dimensional data to reduce the dimensionality by removing irrelevant and redundant features to improve the predictive accuracy of data mining algorithms. The objective of this work is study the information gain, gain ratio and chi square correlation based feature selection method to reduce the feature dimensionality. Information gain measure identifies the entropy value of each specific feature. The amount of information gain or entropy is used to decide whether the feature is selected or deleted. Gain ratio applies normalization technique to information gain using spilt information value. The correlation based feature selection uses heuristic search strategies to estimate how the features are correlated with the class attribute and how they are important of each other. Experiments were conducted on the German credit dataset available at UCI Machine Learning Repository to reduce the feature dimensionality using these feature selection methods.
  • Keywords
    "Data mining","Correlation","Entropy","History","Communication systems","Prediction algorithms","Filtering algorithms"
  • Publisher
    ieee
  • Conference_Titel
    Advanced Computing and Communication Systems, 2015 International Conference on
  • Type

    conf

  • DOI
    10.1109/ICACCS.2015.7324139
  • Filename
    7324139