DocumentCode :
3069776
Title :
A Document Clustering Approach for Search Engines
Author :
Tsai, Chun-Wei ; Liang, Ting-Wen ; Ho, Jiun-Huei ; Yang, Chu-Sing ; Chiang, Ming-Chao
Author_Institution :
Nat. Sun Yat-sen Univ., Kaohsiung
Volume :
2
fYear :
2006
fDate :
8-11 Oct. 2006
Firstpage :
1050
Lastpage :
1055
Abstract :
This paper presents a new internet search engine system called document clustering for search engines (DCSE). This system focuses on overcoming the following challenges faced by search engines: (1) relevance of the search results in response to a user query and (2) information coverage. The DCSE system is based upon a meta-search engine that integrates information retrieval (IR), information extraction (IE), genetic algorithm (GA) and document clustering algorithm into a single system. DCSE utilizes information extraction techniques and vector space model (VSM) calculations to determine the relevance of various data, and then categorizes the data via information retrieval and document clustering algorithm in order to better refine the result. Users will receive information that has been calculated and sorted and web links that are ranked according to their relevance. The end result will reduce the amount of time that users spend filtering out irrelevant data.
Keywords :
Internet; genetic algorithms; information retrieval; search engines; Internet search engine system; document clustering algorithm; genetic algorithm; information coverage; information extraction; information extraction techniques; information retrieval; meta-search engine; user query; vector space model calculations; Catalogs; Clustering algorithms; Computer science; Data mining; IP networks; Information filtering; Information filters; Information retrieval; Internet; Search engines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on
Conference_Location :
Taipei
Print_ISBN :
1-4244-0099-6
Electronic_ISBN :
1-4244-0100-3
Type :
conf
DOI :
10.1109/ICSMC.2006.384538
Filename :
4273986
Link To Document :
بازگشت