• DocumentCode
    3772299
  • Title

    Agglomerative Hierarchical Clustering for Information Retrieval Using Latent Semantic Index

  • Author

    Hansaem Park;Kyunglag Kwon;Abdel-ilah Zakaria Khiati;Jeungmin Lee;In-Jeong Chung

  • Author_Institution
    Dept. of Comput. Sci., Korea Univ., Sejong, South Korea
  • fYear
    2015
  • Firstpage
    426
  • Lastpage
    431
  • Abstract
    Web clustering has been a highly interesting research field in Information Retrieval (IR) for many years. Considering the amount of web sites listed with an ambiguous query on major search engines, many researchers opted for Search Results Clustering (SRC) aiming on grouping vast lists of results into topically comprehensible clusters. Although some well-known algorithms exist already, results show there is still more work to be done in many aspects. This paper proposes method integrating Latent Semantic Indexing (LSI) with Agglomerative Hierarchical Clustering (AHC). The approach behind combining these two methods is to counter the synonymy and polysemy that occurs when previous SRC methods use bag-of-words model. Moreover, we observe that clusters by previous SRC methods are not satisfied and can be further clustered. Thus, we give room for other hidden topics to be shown. For the verification of proposed method, we use two common datasets AMBIguous ENTries (AMBIENT) and MORE Sense-tagged QUEries (MORESQUE), showing significant improvement in terms of clustering quality.
  • Keywords
    "Clustering algorithms","Large scale integration","Semantics","Search engines","Indexing"
  • Publisher
    ieee
  • Conference_Titel
    Smart City/SocialCom/SustainCom (SmartCity), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/SmartCity.2015.108
  • Filename
    7463762