• DocumentCode
    3576333
  • Title

    Pseudo labels for imbalanced multi-label learning

  • Author

    Wenrong Zeng ; Xuewen Chen ; Hong Cheng

  • Author_Institution
    Univ. of Kansas, Lawrence, KS, USA
  • fYear
    2014
  • Firstpage
    25
  • Lastpage
    31
  • Abstract
    The classification with instances which can be tagged with any of the 2L possible subsets from the predefined L labels is called multi-label classification. Multi-label classification is commonly applied in domains, such as multimedia, text, web and biological data analysis. The main challenge lying in multi-label classification is the dilemma of optimising label correlations over exponentially large label powerset and the ignorance of label correlations using binary relevance strategy (1-vs-all heuristic). The classification with label powerset usually encounters with highly skewed data distribution, called imbalanced problem. While binary relevance strategy reduces the problem from exponential to linear, it totally neglects the label correlations. In this artical, we propose a novel strategy of introducing Balanced Pseudo-Labels (BPL) which build more robust classifiers for imbalanced multi-label classification, which embeds imbalanced data in the problems innately. By incorporating the new balanced labels we aim to increase the average distances among the distinct label vectors. In this way, we also code the label correlation implicitly in the algorithm. Another advantage of the proposed method is that it can combined with any classifier and it is proportional to linear label transformation. In the experiment, we choose five multi-label benchmark data sets and compare our algorithm with the most state-of-art algorithms. Our algorithm outperforms them in standard multi-label evaluation in most scenarios.
  • Keywords
    learning (artificial intelligence); pattern classification; BPL; balanced pseudo-label; binary relevance strategy; data distribution; imbalanced data; imbalanced problem; label correlation; label powerset; label vector; linear label transformation; multilabel benchmark data set; multilabel classification; multilabel learning; pseudo label; robust classifier; Classification algorithms; Correlation; Kernel; Linear programming; Power line communications; Prediction algorithms; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Science and Advanced Analytics (DSAA), 2014 International Conference on
  • Type

    conf

  • DOI
    10.1109/DSAA.2014.7058047
  • Filename
    7058047