Title :
Tree Labeled LDA: A Hierarchical model for web summaries
Author :
Slutsky, Anton ; Xiaohua Hu ; Yuan An
Author_Institution :
Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA, USA
Abstract :
We study the applications of hierarchical topic models to represent the content of website summaries. We concentrate on the DMOZ collection of Web extracts and propose a novel Tree Labeled LDA (tLLDA) algorithm to infer topic models using its manually compiled ontology. The algorithm takes advantage of the ontology structure and infers topic models by jointly modeling word and ontology node assignments for documents. We evaluate the performance of our topic modeling approach against that of four state-of-the-art algorithms (Labeled LDA, Hierarchically Labeled LDA, Hierarchically Supervised LDA and Supervised LDA) and show improvement in terms of perplexity and accuracy. Our evaluation shows that topic models produced by tLLDA outperform other algorithms in terms of perplexity for all test sets and all but one test case in terms of accuracy.
Keywords :
Internet; Web sites; ontologies (artificial intelligence); tree data structures; DMOZ collection; Web extracts; Website summaries; hierarchical topic models; manually compiled ontology; ontology node assignments; tLLDA; tree labeled LDA; Accuracy; Data models; Educational institutions; Ontologies; Predictive models; Vectors; Vocabulary;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691745