DocumentCode :
245002
Title :
Towards Scalable and Accurate Online Feature Selection for Big Data
Author :
Kui Yu ; Xindong Wu ; Wei Ding ; Jian Pei
Author_Institution :
Sch. of Comput. Sci., Simon Fraser Univ., Burnaby, BC, Canada
fYear :
2014
fDate :
14-17 Dec. 2014
Firstpage :
660
Lastpage :
669
Abstract :
Feature selection is important in many big data applications. There are at least two critical challenges. Firstly, in many applications, the dimensionality is extremely high, in millions, and keeps growing. Secondly, feature selection has to be highly scalable, preferably in an online manner such that each feature can be processed in a sequential scan. In this paper, we develop SAOLA, a Scalable and Accurate On Line Approach for feature selection. With a theoretical analysis on a low bound on the pair wise correlations between features in the currently selected feature subset, SAOLA employs novel online pair wise comparison techniques to address the two challenges and maintain a parsimonious model over time in an online manner. An empirical study using a series of benchmark real data sets shows that SAOLA is scalable on data sets of extremely high dimensionality, and has superior performance over the state-of-the-art feature selection methods.
Keywords :
Big Data; data mining; Big Data; SAOLA; feature selection; pairwise correlation; scalable and accurate online approach; Accuracy; Big data; Correlation; Markov processes; Redundancy; Search problems; Training; Extremely high dimensionality; Feature redundancy; Online feature selection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
ISSN :
1550-4786
Print_ISBN :
978-1-4799-4303-6
Type :
conf
DOI :
10.1109/ICDM.2014.63
Filename :
7023383
Link To Document :
بازگشت