DocumentCode
3069776
Title
A Document Clustering Approach for Search Engines
Author
Tsai, Chun-Wei ; Liang, Ting-Wen ; Ho, Jiun-Huei ; Yang, Chu-Sing ; Chiang, Ming-Chao
Author_Institution
Nat. Sun Yat-sen Univ., Kaohsiung
Volume
2
fYear
2006
fDate
8-11 Oct. 2006
Firstpage
1050
Lastpage
1055
Abstract
This paper presents a new internet search engine system called document clustering for search engines (DCSE). This system focuses on overcoming the following challenges faced by search engines: (1) relevance of the search results in response to a user query and (2) information coverage. The DCSE system is based upon a meta-search engine that integrates information retrieval (IR), information extraction (IE), genetic algorithm (GA) and document clustering algorithm into a single system. DCSE utilizes information extraction techniques and vector space model (VSM) calculations to determine the relevance of various data, and then categorizes the data via information retrieval and document clustering algorithm in order to better refine the result. Users will receive information that has been calculated and sorted and web links that are ranked according to their relevance. The end result will reduce the amount of time that users spend filtering out irrelevant data.
Keywords
Internet; genetic algorithms; information retrieval; search engines; Internet search engine system; document clustering algorithm; genetic algorithm; information coverage; information extraction; information extraction techniques; information retrieval; meta-search engine; user query; vector space model calculations; Catalogs; Clustering algorithms; Computer science; Data mining; IP networks; Information filtering; Information filters; Information retrieval; Internet; Search engines;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on
Conference_Location
Taipei
Print_ISBN
1-4244-0099-6
Electronic_ISBN
1-4244-0100-3
Type
conf
DOI
10.1109/ICSMC.2006.384538
Filename
4273986
Link To Document