• DocumentCode
    3014659
  • Title

    A unifying viewpoint of some clustering techniques using Bregman divergences and extensions to mixed data sets

  • Author

    Levasseur, Cécile ; Burdge, Brandon ; Kreutz-Delgado, Ken ; Mayer, Uwe F.

  • Author_Institution
    Jacobs Sch. of Eng., Univ. of California, San Diego, La Jolla, CA
  • fYear
    2008
  • fDate
    24-27 Dec. 2008
  • Firstpage
    56
  • Lastpage
    63
  • Abstract
    We present a general viewpoint using Bregman divergences and exponential family properties that contains as special cases the three following algorithms: 1) exponential family principal component analysis (exponential PCA), 2) Semi-Parametric exponential family principal component analysis (SP-PCA) and 3) Bregman soft clustering. This framework is equivalent to a mixed data-type hierarchical Bayes graphical model assumption with latent variables constrained to a low-dimensional parameter subspace. We show that within this framework exponential PCA and SPPCA are similar to the Bregman soft clustering technique with the addition of a linear constraint in the parameter space. We implement the resulting modifications to SP-PCA and Bregman soft clustering for mixed (continuous and/or discrete) data sets, and add a nonparametric estimation of the point-mass probabilities to exponential PCA. Finally, we compare the relative performances of the three algorithms in a clustering setting for mixed data sets.
  • Keywords
    Bayes methods; nonparametric statistics; pattern clustering; principal component analysis; probability; Bregman divergence; Bregman soft clustering; clustering technique; exponential family properties; linear constraint; low-dimensional parameter subspace; mixed data sets; mixed data-type hierarchical Bayes graphical model; nonparametric estimation; parameter space; point-mass probabilities; semiparametric exponential family principal component analysis; Artificial intelligence; Clustering algorithms; Data engineering; Data mining; Density functional theory; Euclidean distance; Graphical models; Jacobian matrices; Principal component analysis; Subspace constraints;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Technology, 2008. ICCIT 2008. 11th International Conference on
  • Conference_Location
    Khulna
  • Print_ISBN
    978-1-4244-2135-0
  • Electronic_ISBN
    978-1-4244-2136-7
  • Type

    conf

  • DOI
    10.1109/ICCITECHN.2008.4803110
  • Filename
    4803110