Analysis of feature selection techniques in credit risk assessment

Author

R. S. Ramya;S. Kumaresan

Author_Institution

Department of CSE, Government college of Technology, Coimbatore

fYear

2015

Firstpage

1

Lastpage

6

Abstract

Data Mining is an automated extraction of hidden knowledge from large amount of data. The computational complexity of the data mining algorithms increases rapidly as the number of features in the dataset increases. Real world credit datasets have accumulated large quantities of information about clients and their financial and payment history. Feature selection techniques are used on such high dimensional data to reduce the dimensionality by removing irrelevant and redundant features to improve the predictive accuracy of data mining algorithms. The objective of this work is study the information gain, gain ratio and chi square correlation based feature selection method to reduce the feature dimensionality. Information gain measure identifies the entropy value of each specific feature. The amount of information gain or entropy is used to decide whether the feature is selected or deleted. Gain ratio applies normalization technique to information gain using spilt information value. The correlation based feature selection uses heuristic search strategies to estimate how the features are correlated with the class attribute and how they are important of each other. Experiments were conducted on the German credit dataset available at UCI Machine Learning Repository to reduce the feature dimensionality using these feature selection methods.

Keywords

"Data mining","Correlation","Entropy","History","Communication systems","Prediction algorithms","Filtering algorithms"

Publisher

ieee

Conference_Titel

Advanced Computing and Communication Systems, 2015 International Conference on

Type

conf

DOI

10.1109/ICACCS.2015.7324139

Filename

7324139