DocumentCode :
2682537
Title :
Distributed Approach to Feature Selection From Very Large Data Sets Using BLEM2
Author :
Chan, Chien-Chung ; Selvaraj, Sivaraj
Author_Institution :
Dept. of Comput. Sci., Akron Univ., OH
fYear :
2006
fDate :
3-6 June 2006
Firstpage :
559
Lastpage :
563
Abstract :
Feature selection is an important step in the preprocessing of raw data for data mining. It involves eliminating redundant and irrelevant features from the dataset to get a subset of features, which performs as efficient as the complete set. The wrapper approach to the problem of feature selection is to use an induction algorithm to select the features. Most induction algorithms fail to handle large datasets. The obvious method that can be employed to handle large datasets is "divide and conquer". This paper introduces a strategy for finding features from a collection of distributed subsets using the BLEM2 rule-based inductive learning program. Heurstics for determining proper number of subsets and proper subset size are proposed. The proposed strategy has been tested on the intrusion detection systems dataset made available by MIT Lincoln labs
Keywords :
data mining; set theory; very large databases; BLEM2; divide and conquer method; feature selection; intrusion detection systems; rule-based inductive learning program; very large data sets; wrapper approach; Computational efficiency; Computer science; Data mining; Databases; Explosives; Filters; Intrusion detection; Performance analysis; Redundancy; System testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Information Processing Society, 2006. NAFIPS 2006. Annual meeting of the North American
Conference_Location :
Montreal, Que.
Print_ISBN :
1-4244-0362-6
Electronic_ISBN :
1-4244-0363-4
Type :
conf
DOI :
10.1109/NAFIPS.2006.365470
Filename :
4216863
Link To Document :
بازگشت