• DocumentCode
    46240
  • Title

    Automatic Taxonomy Construction from Keywords via Scalable Bayesian Rose Trees

  • Author

    Yangqiu Song ; Shixia Liu ; Xueqing Liu ; Haixun Wang

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
  • Volume
    27
  • Issue
    7
  • fYear
    2015
  • fDate
    July 1 2015
  • Firstpage
    1861
  • Lastpage
    1874
  • Abstract
    In this paper, we study a challenging problem of deriving a taxonomy from a set of keyword phrases. A solution can benefit many real-world applications because i) keywords give users the flexibility and ease to characterize a specific domain; and ii) in many applications, such as online advertisements, the domain of interest is already represented by a set of keywords. However, it is impossible to create a taxonomy out of a keyword set itself. We argue that additional knowledge and context are needed. To this end, we first use a general-purpose knowledgebase and keyword search to supply the required knowledge and context. Then, we develop a Bayesian approach to build a hierarchical taxonomy for a given set of keywords. We reduce the complexity of previous hierarchical clustering approaches from O(n2 log n) to O(n log n) using a nearest-neighbor-based approximation, so that we can derive a domainspecific taxonomy from one million keyword phrases in less than an hour. Finally, we conduct comprehensive large scale experiments to show the effectiveness and efficiency of our approach. A real life example of building an insurance-related web search query taxonomy illustrates the usefulness of our approach for specific domains.
  • Keywords
    belief networks; computational complexity; query processing; text analysis; Bayesian rose trees; automatic taxonomy construction; complexity; domain-specific taxonomy; general-purpose knowledgebase; hierarchical taxonomy; keyword search; nearest-neighbor-based approximation; Approximation methods; Buildings; Clustering algorithms; Context; Insurance; Search engines; Taxonomy; Bayesian Rose Tree; Bayesian rose tree; Hierarchical Clustering; Keyword Taxonomy Building; Short Text Conceptualization; hierarchical clustering; keyword taxonomy building; short text conceptualization;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2015.2397432
  • Filename
    7029112