• DocumentCode
    2504205
  • Title

    On provable exact low-rank recovery in topic models

  • Author

    Behmardi, Behrouz ; Raich, Raviv

  • Author_Institution
    Sch. of EECS, Oregon State Univ., Corvallis, OR, USA
  • fYear
    2011
  • fDate
    28-30 June 2011
  • Firstpage
    265
  • Lastpage
    268
  • Abstract
    In the past few years, probabilistic topic models have been developed and applied to problems in text document classification and computer vision. Such models provide a probabilistic framework for characterizing a corpus of documents (or images) in the bag-of-words representation. Key feature of such models is that a low dimensional representation is facilitated through latent topic variables. Most inference algorithms in topic models assume a fixed number of topics and determine the number of topics empirically. In this paper, we consider the problem of identifying the number of topics in topic models. We present a rank minimization framework and provide sufficient conditions, which guarantee exact recovery of the number of topics. Moreover, we propose a heuristic convex relaxation to the rank minimization. Using simulations, we show that the proposed convex relaxation provides exact rank recovery under the sufficient conditions proposed for the rank minimization problem.
  • Keywords
    computer vision; inference mechanisms; pattern classification; probability; text analysis; bag-of-words representation; computer vision; document corpus characterization; heuristic convex relaxation; inference algorithms; low rank matrix recovery; probabilistic topic models; provable exact low-rank recovery; rank minimization framework; text document classification; Computational modeling; Linear matrix inequalities; Matching pursuit algorithms; Minimization; Noise; Optimization; Probabilistic logic; low rank matrix recovery; nuclear norm minimization; topic models;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Statistical Signal Processing Workshop (SSP), 2011 IEEE
  • Conference_Location
    Nice
  • ISSN
    pending
  • Print_ISBN
    978-1-4577-0569-4
  • Type

    conf

  • DOI
    10.1109/SSP.2011.5967677
  • Filename
    5967677