• DocumentCode
    659596
  • Title

    Tree Labeled LDA: A Hierarchical model for web summaries

  • Author

    Slutsky, Anton ; Xiaohua Hu ; Yuan An

  • Author_Institution
    Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA, USA
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    134
  • Lastpage
    140
  • Abstract
    We study the applications of hierarchical topic models to represent the content of website summaries. We concentrate on the DMOZ collection of Web extracts and propose a novel Tree Labeled LDA (tLLDA) algorithm to infer topic models using its manually compiled ontology. The algorithm takes advantage of the ontology structure and infers topic models by jointly modeling word and ontology node assignments for documents. We evaluate the performance of our topic modeling approach against that of four state-of-the-art algorithms (Labeled LDA, Hierarchically Labeled LDA, Hierarchically Supervised LDA and Supervised LDA) and show improvement in terms of perplexity and accuracy. Our evaluation shows that topic models produced by tLLDA outperform other algorithms in terms of perplexity for all test sets and all but one test case in terms of accuracy.
  • Keywords
    Internet; Web sites; ontologies (artificial intelligence); tree data structures; DMOZ collection; Web extracts; Website summaries; hierarchical topic models; manually compiled ontology; ontology node assignments; tLLDA; tree labeled LDA; Accuracy; Data models; Educational institutions; Ontologies; Predictive models; Vectors; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691745
  • Filename
    6691745