DocumentCode
1879213
Title
A Novel Classifier-Independent Feature Selection Algorithm for Imbalanced Datasets
Author
Zhu, Quanyin ; Cao, Suqun
Author_Institution
Dept. of Comput. Eng., Huaiyin Inst. of Technol., Huaiyin, China
fYear
2009
fDate
27-29 May 2009
Firstpage
77
Lastpage
82
Abstract
A novel classifier-independent feature selection algorithm based on the posterior probability is proposed for imbalanced datasets. First, an imbalanced factor is introduced and computed by Parzen-window estimation. The middle point of Tomek links is chosen as the initial point. Accordingly, this algorithm is iterated to find out the boundary points which have the equality of posterior probability. Through the project computation on the normal vectors of these points, the weight of each feature can be obtained, which actually indicates the importance degree of each feature. The experimental results on 3 real-word datasets demonstrate that this proposed algorithm can not only reduce the computational cost but also overcome the shortcoming that the majority class may be detected well but the minority class may be ignored in the conventional feature selection algorithm.
Keywords
estimation theory; pattern classification; probability; Parzen-window estimation; Tomek link; classifier-independent feature selection; imbalanced dataset; posterior probability; Artificial intelligence; Computer networks; Concurrent computing; Data engineering; Distributed computing; Electronic mail; Intelligent networks; Mechanical engineering; Software algorithms; Software engineering; feature selection; imbalanced datasets; posterior probability;
fLanguage
English
Publisher
ieee
Conference_Titel
Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, 2009. SNPD '09. 10th ACIS International Conference on
Conference_Location
Daegu
Print_ISBN
978-0-7695-3642-2
Type
conf
DOI
10.1109/SNPD.2009.47
Filename
5286691
Link To Document