مرکز منطقه ای اطلاع رساني علوم و فناوري - Distributed Approach to Feature Selection From Very Large Data Sets Using BLEM2

DocumentCode :

2682537

Title :

Distributed Approach to Feature Selection From Very Large Data Sets Using BLEM2

Author :

Chan, Chien-Chung ; Selvaraj, Sivaraj

Author_Institution :

Dept. of Comput. Sci., Akron Univ., OH

fYear :

2006

fDate :

3-6 June 2006

Firstpage :

559

Lastpage :

563

Abstract :

Feature selection is an important step in the preprocessing of raw data for data mining. It involves eliminating redundant and irrelevant features from the dataset to get a subset of features, which performs as efficient as the complete set. The wrapper approach to the problem of feature selection is to use an induction algorithm to select the features. Most induction algorithms fail to handle large datasets. The obvious method that can be employed to handle large datasets is "divide and conquer". This paper introduces a strategy for finding features from a collection of distributed subsets using the BLEM2 rule-based inductive learning program. Heurstics for determining proper number of subsets and proper subset size are proposed. The proposed strategy has been tested on the intrusion detection systems dataset made available by MIT Lincoln labs

Keywords :

data mining; set theory; very large databases; BLEM2; divide and conquer method; feature selection; intrusion detection systems; rule-based inductive learning program; very large data sets; wrapper approach; Computational efficiency; Computer science; Data mining; Databases; Explosives; Filters; Intrusion detection; Performance analysis; Redundancy; System testing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Fuzzy Information Processing Society, 2006. NAFIPS 2006. Annual meeting of the North American

Conference_Location :

Montreal, Que.

Print_ISBN :

1-4244-0362-6

Electronic_ISBN :

1-4244-0363-4

Type :

conf

DOI :

10.1109/NAFIPS.2006.365470

Filename :

4216863

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2682537