Title :
On provable exact low-rank recovery in topic models
Author :
Behmardi, Behrouz ; Raich, Raviv
Author_Institution :
Sch. of EECS, Oregon State Univ., Corvallis, OR, USA
Abstract :
In the past few years, probabilistic topic models have been developed and applied to problems in text document classification and computer vision. Such models provide a probabilistic framework for characterizing a corpus of documents (or images) in the bag-of-words representation. Key feature of such models is that a low dimensional representation is facilitated through latent topic variables. Most inference algorithms in topic models assume a fixed number of topics and determine the number of topics empirically. In this paper, we consider the problem of identifying the number of topics in topic models. We present a rank minimization framework and provide sufficient conditions, which guarantee exact recovery of the number of topics. Moreover, we propose a heuristic convex relaxation to the rank minimization. Using simulations, we show that the proposed convex relaxation provides exact rank recovery under the sufficient conditions proposed for the rank minimization problem.
Keywords :
computer vision; inference mechanisms; pattern classification; probability; text analysis; bag-of-words representation; computer vision; document corpus characterization; heuristic convex relaxation; inference algorithms; low rank matrix recovery; probabilistic topic models; provable exact low-rank recovery; rank minimization framework; text document classification; Computational modeling; Linear matrix inequalities; Matching pursuit algorithms; Minimization; Noise; Optimization; Probabilistic logic; low rank matrix recovery; nuclear norm minimization; topic models;
Conference_Titel :
Statistical Signal Processing Workshop (SSP), 2011 IEEE
Conference_Location :
Nice
Print_ISBN :
978-1-4577-0569-4
DOI :
10.1109/SSP.2011.5967677