Title :
Evaluation of Stability and Similarity of Latent Dirichlet Allocation
Author :
Jun Tang ; Ruilong Huo ; Jiali Yao
Author_Institution :
China COE, Pivotal, Beijing, China
Abstract :
Latent Dirichlet Allocation (LDA) is an unsupervised, statistical method to model documents and discover latent semantic topics from large set of documents and categorize them into learned topics. In this paper, we first introduce LDA and its distributed version Parallel LDA (PLDA), along with some popular implementations. Then we propose a systematic solution to evaluate stability and similarity of the trained models and classification results of LDA/PLDA. We address three key challenges within the evaluation solution: (i) topics matching in Kullback Liebler (KL) divergence calculation, (ii) calculation of stability using KL divergence and interpretation of relationship between KL divergence and stability of the trained model and the classification results, (iii) calculation and evaluation of similarity of trained models and classification results. Finally, we experiment with real life datasets to show that our solution is sufficient and efficient.
Keywords :
data mining; distributed processing; document handling; pattern classification; statistical analysis; unsupervised learning; KL divergence calculation; Kullback Liebler divergence calculation; LDA classification; PLDA classification; distributed parallel LDA; document modelling; latent Dirichlet allocation similarity evaluation; latent Dirichlet allocation stability evaluation; latent semantic topic discovery; topic matching; trained model similarity calculation; trained model similarity evaluation; unsupervised statistical method; Classification algorithms; Computational modeling; Electromagnetic compatibility; Google; Measurement; Stability analysis; Systematics; LDA; evaluation; similarity; stability;
Conference_Titel :
Software Engineering (WCSE), 2013 Fourth World Congress on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4799-2882-8
DOI :
10.1109/WCSE.2013.17