• DocumentCode
    2132692
  • Title

    AuToCrawler: an integrated system for automatic topical crawler

  • Author

    Tsay, Jyh-Jong ; Shih, Chen-Yang ; Wu, Bo-Liang

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., National Chung Cheng Univ., Chiayi, Taiwan
  • fYear
    2005
  • fDate
    2005
  • Firstpage
    462
  • Lastpage
    467
  • Abstract
    A topical (or focused) crawler is a Web crawler aiming to search and retrieve Web pages from the World Wide Web, which are related to a specific topic. Rather than downloading all accessible Web pages, a topical crawler analyzes the frontier of the crawled region to visit only the portion of the Web that contains relevant Web pages, and at the same time, try to skip irrelevant regions. This leads to significant savings in both computation and communication resources. In this paper, we present an integrated topical crawler: AuToCrawler. The main features of AuToCrawler consist of a user interest specification module that mediates between users and search engines to identify target examples and keywords that together specify the topic of their interest, and a URL ordering strategy that combines features of several previous approaches and achieves significant improvement. It also provides a graphic user interface such that users can evaluate and visualize the crawling results that can be used as feedback to reconfigure the crawler.
  • Keywords
    data visualisation; graphical user interfaces; information retrieval; learning (artificial intelligence); search engines; AuToCrawler; URL ordering; Web crawler; Web page retrieval; Web page searching; World Wide Web; automatic topical crawler; crawler reconfiguration; feedback; focused crawler; graphic user interface; information retrieval; keywords; machine learning; search engines; user interest specification module; Computer science; Crawlers; Graphics; Information retrieval; Search engines; Uniform resource locators; User interfaces; Visualization; Web pages; Web sites; Focused Crawler; Information Retrieval; Machine Learning; Search Engines; Topical Crawler;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Science, 2005. Fourth Annual ACIS International Conference on
  • Print_ISBN
    0-7695-2296-3
  • Type

    conf

  • DOI
    10.1109/ICIS.2005.33
  • Filename
    1515448