• DocumentCode
    3069776
  • Title

    A Document Clustering Approach for Search Engines

  • Author

    Tsai, Chun-Wei ; Liang, Ting-Wen ; Ho, Jiun-Huei ; Yang, Chu-Sing ; Chiang, Ming-Chao

  • Author_Institution
    Nat. Sun Yat-sen Univ., Kaohsiung
  • Volume
    2
  • fYear
    2006
  • fDate
    8-11 Oct. 2006
  • Firstpage
    1050
  • Lastpage
    1055
  • Abstract
    This paper presents a new internet search engine system called document clustering for search engines (DCSE). This system focuses on overcoming the following challenges faced by search engines: (1) relevance of the search results in response to a user query and (2) information coverage. The DCSE system is based upon a meta-search engine that integrates information retrieval (IR), information extraction (IE), genetic algorithm (GA) and document clustering algorithm into a single system. DCSE utilizes information extraction techniques and vector space model (VSM) calculations to determine the relevance of various data, and then categorizes the data via information retrieval and document clustering algorithm in order to better refine the result. Users will receive information that has been calculated and sorted and web links that are ranked according to their relevance. The end result will reduce the amount of time that users spend filtering out irrelevant data.
  • Keywords
    Internet; genetic algorithms; information retrieval; search engines; Internet search engine system; document clustering algorithm; genetic algorithm; information coverage; information extraction; information extraction techniques; information retrieval; meta-search engine; user query; vector space model calculations; Catalogs; Clustering algorithms; Computer science; Data mining; IP networks; Information filtering; Information filters; Information retrieval; Internet; Search engines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on
  • Conference_Location
    Taipei
  • Print_ISBN
    1-4244-0099-6
  • Electronic_ISBN
    1-4244-0100-3
  • Type

    conf

  • DOI
    10.1109/ICSMC.2006.384538
  • Filename
    4273986