DocumentCode :
3678652
Title :
Most large topic models are approximately separable
Author :
Weicong Ding;Prakash Ishwar;Venkatesh Saligrama
Author_Institution :
Department of ECE, Boston University, MA 02215, USA
fYear :
2015
Firstpage :
199
Lastpage :
203
Abstract :
Separability has recently been leveraged as a key structural condition in topic models to develop asymptotically consistent algorithms with polynomial statistical and computational efficiency guarantees. Separability corresponds to the presence of at least one novel word for each topic. Empirical estimates of topic matrices for Latent Dirichlet Allocation models have been observed to be approximately separable. Separability may be a convenient structural property, but it appears to be too restrictive a condition. In this paper we explicitly demonstrate that separability is, in fact, an inevitable consequence of high-dimensionality. In particular, we prove that when the columns of the topic matrix are independently sampled from a Dirichlet distribution, the resulting topic matrix will be approximately separable with probability tending to one as the number of rows (vocabulary size) scales to infinity sufficiently faster than the number of columns (topics). This is based on combining concentration of measure results with properties of the Dirichlet distribution and union bounding arguments. Our proof techniques can be extended to other priors for general nonnegative matrices.
Keywords :
"Computational modeling","Semantics","Computers"
Publisher :
ieee
Conference_Titel :
Information Theory and Applications Workshop (ITA), 2015
Type :
conf
DOI :
10.1109/ITA.2015.7308989
Filename :
7308989
Link To Document :
بازگشت