• DocumentCode
    21560
  • Title

    Learning the Information Divergence

  • Author

    Dikmen, Onur ; Zhirong Yang ; Oja, Erkki

  • Author_Institution
    Dept. of Inf. & Comput. Sci., Aalto Univ., Espoo, Finland
  • Volume
    37
  • Issue
    7
  • fYear
    2015
  • fDate
    July 1 2015
  • Firstpage
    1442
  • Lastpage
    1454
  • Abstract
    Information divergence that measures the difference between two nonnegative matrices or tensors has found its use in a variety of machine learning problems. Examples are Nonnegative Matrix/Tensor Factorization, Stochastic Neighbor Embedding, topic models, and Bayesian network optimization. The success of such a learning task depends heavily on a suitable divergence. A large variety of divergences have been suggested and analyzed, but very few results are available for an objective choice of the optimal divergence for a given task. Here we present a framework that facilitates automatic selection of the best divergence among a given family, based on standard maximum likelihood estimation. We first propose an approximated Tweedie distribution for the β-divergence family. Selecting the best β then becomes a machine learning problem solved by maximum likelihood. Next, we reformulate α-divergence in terms of β-divergence, which enables automatic selection of α by maximum likelihood with reuse of the learning principle for β-divergence. Furthermore, we show the connections between γ- and β-divergences as well as Renyi- and α-divergences, such that our automatic selection framework is extended to non-separable divergences. Experiments on both synthetic and real-world data demonstrate that our method can quite accurately select information divergence across different learning problems and various divergence families.
  • Keywords
    information theory; learning (artificial intelligence); maximum likelihood estimation; α-divergence; β-divergence family; γ-divergence; Renyi-divergence; approximated Tweedie distribution; automatic selection framework; information divergence; learning principle reuse; machine learning problem; nonseparable divergences; standard maximum likelihood estimation; Approximation methods; Brain modeling; Maximum likelihood estimation; Medals; Standards; Stochastic processes; Tensile stress; Information divergence; Tweedie distribution; information divergence; maximum likelihood; nonnegative matrix factorization; stochastic neighbor embedding; tweedie distribution;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2014.2366144
  • Filename
    6942194