Label Construction for Multi-label Feature Selection

Author

Spolaor, Newton ; Monard, Maria Carolina ; Tsoumakas, Grigorios ; Huei Lee

Author_Institution

Lab. of Comput. Intell., Univ. of Sao Paulo, Sao Carlos, Brazil

fYear

2014

fDate

18-22 Oct. 2014

Firstpage

247

Lastpage

252

Abstract

Multi-label learning handles datasets where each instance is associated with multiple labels, which are often correlated. As other machine learning tasks, multi-label learning also suffers from the curse of dimensionality, which can be mitigated by dimensionality reduction tasks, such as feature selection. The standard approach for multi-label feature selection transforms the multi-label dataset into single-label datasets before using traditional feature selection algorithms. However, this approach often ignores label dependence. This work proposes an alternative method, LCFS, which constructs new labels based on relations between the original labels to augment the label set of the original dataset. Afterwards, the augmented dataset is submitted to the standard multi-label feature selection approach. Experiments using Information Gain as a measure to evaluate features were carried out in 10 multi-label benchmark datasets. For each dataset, the quality of the features selected was assessed by the quality of the classifiers built using the features selected by the standard approach in the original dataset, as well as in the dataset constructed by four LCFS settings. The results show that setting LCFS with simple strategies using pairs of labels gives rise to better classifiers than the ones built using the standard approach in the original dataset. Moreover, these good results are accomplished when a small number of features are selected.

Keywords

data handling; feature extraction; learning (artificial intelligence); LCFS settings; augmented dataset; dimensionality reduction tasks; information gain; label construction; label dependence; machine learning tasks; multilabel benchmark datasets; multilabel learning; single label datasets; standard multilabel feature selection algorithm; Accuracy; Educational institutions; Entropy; Laboratories; Loss measurement; Standards; Transforms; Binary Relevance; Information Gain; feature ranking; filter feature selection; systematic review;

fLanguage

English

Publisher

ieee

Conference_Titel

Intelligent Systems (BRACIS), 2014 Brazilian Conference on

Conference_Location

Sao Paulo

Type

conf

DOI

10.1109/BRACIS.2014.52

Filename

6984838