• DocumentCode
    3343264
  • Title

    Freshness of Web search engines: Improving performance of Web search engines using data mining techniques

  • Author

    Kharazmi, S. ; Nejad, A.F. ; Abolhassani, H.

  • Author_Institution
    Dept. of Comput. Eng., WI Lab., Payame Noor Univ., Tehran, Iran
  • fYear
    2009
  • fDate
    9-12 Nov. 2009
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    Progressive use of Web based information retrieval systems such as general purpose search engines and dynamic nature of the Web make it necessary to continually maintain Web based information retrieval systems. Crawlers facilitate this process by following hyperlinks in Web pages to automatically download new and updated Web pages. Freshness (recency) is one of the important maintaining factors of Web search engine crawlers that takes weeks to months. Many large Web crawlers start from seed pages, fetch every links from them, and continually repeat this process without any policies that help them to better crawling and improving performance of those. We believe that data mining techniques can help us to improve the freshness parameter by extracting knowledge from crawling data. In this paper we propose a Web crawler that uses extracted knowledge by data mining techniques as policies for crawling. For this purpose we include a component to collect additional crawling information. This crawler starts by non-preferential crawling. After a few crawling, it trained by using mining techniques on crawling data and then uses policies for preferential crawling to improve freshness time. Our research represented that crawling with determined polices has better freshness than generic general purpose Web crawlers.
  • Keywords
    Internet; data mining; information retrieval systems; search engines; Web pages; Web search engine crawlers; data mining techniques; information retrieval systems; non-preferential crawling; Authentication; Availability; Communication system security; Data mining; Law enforcement; Legal factors; Privacy; Public key cryptography; Search engines; Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Internet Technology and Secured Transactions, 2009. ICITST 2009. International Conference for
  • Conference_Location
    London
  • Print_ISBN
    978-1-4244-5647-5
  • Type

    conf

  • DOI
    10.1109/ICITST.2009.5402607
  • Filename
    5402607