DocumentCode :
1879213
Title :
A Novel Classifier-Independent Feature Selection Algorithm for Imbalanced Datasets
Author :
Zhu, Quanyin ; Cao, Suqun
Author_Institution :
Dept. of Comput. Eng., Huaiyin Inst. of Technol., Huaiyin, China
fYear :
2009
fDate :
27-29 May 2009
Firstpage :
77
Lastpage :
82
Abstract :
A novel classifier-independent feature selection algorithm based on the posterior probability is proposed for imbalanced datasets. First, an imbalanced factor is introduced and computed by Parzen-window estimation. The middle point of Tomek links is chosen as the initial point. Accordingly, this algorithm is iterated to find out the boundary points which have the equality of posterior probability. Through the project computation on the normal vectors of these points, the weight of each feature can be obtained, which actually indicates the importance degree of each feature. The experimental results on 3 real-word datasets demonstrate that this proposed algorithm can not only reduce the computational cost but also overcome the shortcoming that the majority class may be detected well but the minority class may be ignored in the conventional feature selection algorithm.
Keywords :
estimation theory; pattern classification; probability; Parzen-window estimation; Tomek link; classifier-independent feature selection; imbalanced dataset; posterior probability; Artificial intelligence; Computer networks; Concurrent computing; Data engineering; Distributed computing; Electronic mail; Intelligent networks; Mechanical engineering; Software algorithms; Software engineering; feature selection; imbalanced datasets; posterior probability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, 2009. SNPD '09. 10th ACIS International Conference on
Conference_Location :
Daegu
Print_ISBN :
978-0-7695-3642-2
Type :
conf
DOI :
10.1109/SNPD.2009.47
Filename :
5286691
Link To Document :
بازگشت