مرکز منطقه ای اطلاع رساني علوم و فناوري - A Semidefinite Programming Based Search Strategy for Feature Selection with Mutual Information Measure

DocumentCode :

50635

Title :

A Semidefinite Programming Based Search Strategy for Feature Selection with Mutual Information Measure

Author :

Naghibi, Tofigh ; Hoffmann, Sarah ; Pfister, Beat

Author_Institution :

Comput. Eng. & Networks Lab., ETH Zurich, Zurich, Switzerland

Volume :

Issue :

fYear :

2015

fDate :

Aug. 1 2015

Firstpage :

1529

Lastpage :

1541

Abstract :

Feature subset selection, as a special case of the general subset selection problem, has been the topic of a considerable number of studies due to the growing importance of data-mining applications. In the feature subset selection problem there are two main issues that need to be addressed: (i) Finding an appropriate measure function than can be fairly fast and robustly computed for high-dimensional data. (ii) A search strategy to optimize the measure over the subset space in a reasonable amount of time. In this article mutual information between features and class labels is considered to be the measure function. Two series expansions for mutual information are proposed, and it is shown that most heuristic criteria suggested in the literature are truncated approximations of these expansions. It is well-known that searching the whole subset space is an NP-hard problem. Here, instead of the conventional sequential search algorithms, we suggest a parallel search strategy based on semidefinite programming (SDP) that can search through the subset space in polynomial time. By exploiting the similarities between the proposed algorithm and an instance of the maximum-cut problem in graph theory, the approximation ratio of this algorithm is derived and is compared with the approximation ratio of the backward elimination method. The experiments show that it can be misleading to judge the quality of a measure solely based on the classification accuracy, without taking the effect of the non-optimum search strategy into account.

Keywords :

approximation theory; computational complexity; data mining; feature selection; graph theory; mathematical programming; pattern classification; search problems; set theory; NP-hard problem; approximation ratio; backward elimination method; classification accuracy; data-mining applications; feature subset selection problem; graph theory; high-dimensional data; maximum-cut problem; mutual information measure; parallel search strategy; polynomial time; semidefinite programming based search strategy; subset space; truncated approximations; Approximation algorithms; Approximation methods; Feature extraction; Measurement uncertainty; Mutual information; Search problems; Vectors; Approximation ratio; Convex objective; Feature Selection; Feature selection; Mutual information; approximation ratio; convex objective; mutual information;

fLanguage :

English

Journal_Title :

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Publisher :

ieee

ISSN :

0162-8828

Type :

jour

DOI :

10.1109/TPAMI.2014.2372791

Filename :

6963440

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=50635