DocumentCode
659596
Title
Tree Labeled LDA: A Hierarchical model for web summaries
Author
Slutsky, Anton ; Xiaohua Hu ; Yuan An
Author_Institution
Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA, USA
fYear
2013
fDate
6-9 Oct. 2013
Firstpage
134
Lastpage
140
Abstract
We study the applications of hierarchical topic models to represent the content of website summaries. We concentrate on the DMOZ collection of Web extracts and propose a novel Tree Labeled LDA (tLLDA) algorithm to infer topic models using its manually compiled ontology. The algorithm takes advantage of the ontology structure and infers topic models by jointly modeling word and ontology node assignments for documents. We evaluate the performance of our topic modeling approach against that of four state-of-the-art algorithms (Labeled LDA, Hierarchically Labeled LDA, Hierarchically Supervised LDA and Supervised LDA) and show improvement in terms of perplexity and accuracy. Our evaluation shows that topic models produced by tLLDA outperform other algorithms in terms of perplexity for all test sets and all but one test case in terms of accuracy.
Keywords
Internet; Web sites; ontologies (artificial intelligence); tree data structures; DMOZ collection; Web extracts; Website summaries; hierarchical topic models; manually compiled ontology; ontology node assignments; tLLDA; tree labeled LDA; Accuracy; Data models; Educational institutions; Ontologies; Predictive models; Vectors; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data, 2013 IEEE International Conference on
Conference_Location
Silicon Valley, CA
Type
conf
DOI
10.1109/BigData.2013.6691745
Filename
6691745
Link To Document