• DocumentCode
    463688
  • Title

    Iterative Denoising using Jensen-Renyi Divergences with an Application to Unsupervised Document Categorization

  • Author

    Karakos, Damianos ; Khudanpur, Sanjeev ; Eisner, J. ; Priebe, Carey E.

  • Author_Institution
    Center for Language & Speech Process., Johns Hopkins Univ., Baltimore, MD, USA
  • Volume
    2
  • fYear
    2007
  • fDate
    15-20 April 2007
  • Abstract
    Iterative denoising trees were used by Karakos et al. (2005) for unsupervised hierarchical clustering. The tree construction involves projecting the data onto low-dimensional spaces, as a means of smoothing their empirical distributions, as well as splitting each node based on an information-theoretic maximization objective. In this paper, we improve upon the work of (Karakos et al., 2005) in two ways: (i) the amount of computation spent searching for a good projection at each node now adapts to the intrinsic dimensionality of the data observed at that node; (ii) the objective at each node is to find a split which maximizes a generalized form of mutual information, the Jensen-Renyi divergence; this is followed by an iterative Naive Bayes classification. The single parameter α of the Jensen-Renyi divergence is chosen based on the "strapping" methodology, which learns a meta-classifier on a related task. Compared with the sequential information bottleneck method, our procedure produces state-of-the-art results on an unsupervised categorization task of documents from the "20 Newsgroups" dataset.
  • Keywords
    Bayes methods; document image processing; image classification; image denoising; iterative methods; trees (mathematics); Jensen-Renyi divergences; Naive Bayes classification; iterative denoising trees; meta-classifier; sequential information bottleneck method; strapping methodology; unsupervised document categorization; unsupervised hierarchical clustering; Classification tree analysis; Computer vision; Decision trees; Distributed computing; Mathematics; Mutual information; Natural languages; Noise reduction; Smoothing methods; Statistical distributions; Unsupervised learning; clustering methods; information theory; text processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
  • Conference_Location
    Honolulu, HI
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0727-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2007.366284
  • Filename
    4217457