Title : 
Emphasizing Minority Class in LDA for Feature Subset Selection on High-Dimensional Small-Sized Problems
         
        
            Author : 
Feng Yang ; Mao, K.Z. ; Lee, Gary Kee Khoon ; Wenyin Tang
         
        
            Author_Institution : 
Dept. of Comput. Sci., Agency for Sci., Technol. & Res. (A*STAR), Singapore, Singapore
         
        
        
        
        
        
        
        
            Abstract : 
Although mostly used for pattern classification, linear discriminant analysis (LDA) can also be used in feature selection as an effective measure to evaluate the separative ability of a feature subset. When applied to feature selection on high-dimensional small-sized (HDSS) data (generally) with class-imbalance, LDA encounters four problems, including singularity of scatter matrix, overfitting, overwhelming and prohibitively computational complexity. In this study, we propose the LDA-based feature selection method minority class emphasized linear discriminant analysis (MCE-LDA) with a new regularization technique to address the first three problems. Different to giving equal or more emphasis to majority class in conventional forms of regularization, the proposed regularization emphasizes more on minority class, with the expectation of improving overall performance by alleviating overwhelming of majority class to minority class as well as overfitting in minority class. In order to reduce computational overhead, an incremental implementation of LDA-based feature selection has been introduced. Comparative studies with other forms of regularization to LDA as well as with other popular feature selection methods on five HDSS problems show that MCE-LDA can produce feature subsets with excellent performance in both classification and robustness. Further experimental results of true positive rate (TPR) and true negative rate (TNR) have also verified the effectiveness of the proposed technique in alleviating overwhelming and overfitting problems.
         
        
            Keywords : 
feature selection; pattern classification; HDSS data; MCE-LDA; TNR; TPR; feature subset selection; feature subset separative ability; high-dimensional small-sized problems; majority class; minority class emphasized linear discriminant analysis; pattern classification; regularization technique; true negative rate; true positive rate; Computational complexity; Data engineering; Error analysis; IEEE transactions; Knowledge engineering; Linear discriminant analysis; Vectors; Feature subset selection; class emphasis; classification; regularized linear discriminant analysis; robustness;
         
        
        
            Journal_Title : 
Knowledge and Data Engineering, IEEE Transactions on
         
        
        
        
        
            DOI : 
10.1109/TKDE.2014.2320732