مرکز منطقه ای اطلاع رساني علوم و فناوري - Robust method of sparse feature selection for multi-label classification with Naive Bayes

DocumentCode :

130363

Title :

Robust method of sparse feature selection for multi-label classification with Naive Bayes

Author :

Ruta, Dymitr

Author_Institution :

British Telecom Innovation Centre, Khalifa Univ., Abu Dhabi, United Arab Emirates

fYear :

2014

fDate :

7-10 Sept. 2014

Firstpage :

375

Lastpage :

380

Abstract :

The explosive growth of big data poses a processing challenge for predictive systems in terms of both data size and its dimensionality. Generating features from text often leads to many thousands of sparse features rarely taking non-zero values. In this work we propose a very fast and robust feature selection method that is optimised with the Naive Bayes classifier. The method takes advantage of the sparse feature representation and uses diversified backward-forward greedy search to arrive with the highly competitive solution at the minimum processing time. It promotes the paradigm of shifting the complexity of predictive systems away from the model algorithm, but towards careful data preprocessing and filtering that allows to accomplish predictive big data tasks on a single processor despite billions of data examples nominally exposed for processing. This method was applied to the AAIA Data Mining Competition 2014 concerned with predicting human injuries as a result of fire incidents based on nearly 12000 risk factors extracted from thousands of fire incident reports and scored the second place with the predictive accuracy of 96%.

Keywords :

Bayes methods; feature selection; greedy algorithms; pattern classification; search problems; AAIA Data Mining Competition 2014; Big Data; data dimensionality; data preprocessing; data size; diversified backward-forward greedy search; filtering; fire incident reports; human injuries prediction; multilabel classification; naive Bayes classifier; predictive systems; risk factors; robust feature selection method; single processor; sparse feature representation; sparse feature selection; Big data; Data mining; Data models; Feature extraction; Measurement; Predictive models; Robustness;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Science and Information Systems (FedCSIS), 2014 Federated Conference on

Conference_Location :

Warsaw

Type :

conf

DOI :

10.15439/2014F502

Filename :

6933040

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=130363