DocumentCode
3772299
Title
Agglomerative Hierarchical Clustering for Information Retrieval Using Latent Semantic Index
Author
Hansaem Park;Kyunglag Kwon;Abdel-ilah Zakaria Khiati;Jeungmin Lee;In-Jeong Chung
Author_Institution
Dept. of Comput. Sci., Korea Univ., Sejong, South Korea
fYear
2015
Firstpage
426
Lastpage
431
Abstract
Web clustering has been a highly interesting research field in Information Retrieval (IR) for many years. Considering the amount of web sites listed with an ambiguous query on major search engines, many researchers opted for Search Results Clustering (SRC) aiming on grouping vast lists of results into topically comprehensible clusters. Although some well-known algorithms exist already, results show there is still more work to be done in many aspects. This paper proposes method integrating Latent Semantic Indexing (LSI) with Agglomerative Hierarchical Clustering (AHC). The approach behind combining these two methods is to counter the synonymy and polysemy that occurs when previous SRC methods use bag-of-words model. Moreover, we observe that clusters by previous SRC methods are not satisfied and can be further clustered. Thus, we give room for other hidden topics to be shown. For the verification of proposed method, we use two common datasets AMBIguous ENTries (AMBIENT) and MORE Sense-tagged QUEries (MORESQUE), showing significant improvement in terms of clustering quality.
Keywords
"Clustering algorithms","Large scale integration","Semantics","Search engines","Indexing"
Publisher
ieee
Conference_Titel
Smart City/SocialCom/SustainCom (SmartCity), 2015 IEEE International Conference on
Type
conf
DOI
10.1109/SmartCity.2015.108
Filename
7463762
Link To Document