Title :
Extremely High-Dimensional Feature Selection via Feature Generating Samplings
Author :
Shutao Li ; Dan Wei
Author_Institution :
Coll. of Electr. & Inf. Eng., Hunan Univ., Changsha, China
Abstract :
To select informative features on extremely high-dimensional problems, in this paper, a sampling scheme is proposed to enhance the efficiency of recently developed feature generating machines (FGMs). Note that in FGMs O(m log r) time complexity should be taken to order the features by their scores; the entire computational cost of feature ordering will become unbearable when m is very large, for example, m > 1011, where m is the feature dimensionality and r is the size of the selected feature subset. To solve this problem, in this paper, we propose a feature generating sampling method, which can reduce this computational complexity to O(Gs log(G) + G(G + log(G))) while preserving the most informative features in a feature buffer, where Gs is the maximum number of nonzero features for each instance and G is the buffer size. Moreover, we show that our proposed sampling scheme can be deemed as the birth-death process based on random processes theory, which guarantees to include most of the informative features for feature selections. Empirical studies on real-world datasets show the effectiveness of the proposed sampling method.
Keywords :
computational complexity; feature selection; random processes; sampling methods; birth-death process; computational complexity; computational cost; extremely high-dimensional feature selection; feature dimensionality; feature generating samplings; feature ordering; high-dimensional problems; informative feature selection; random processes theory; time complexity; Algorithm design and analysis; Analytical models; Complexity theory; Computational efficiency; Sampling methods; Training; Vectors; Extremely high dimensional problem; feature generating machine; feature selection; informative feature;
Journal_Title :
Cybernetics, IEEE Transactions on
DOI :
10.1109/TCYB.2013.2269765