• DocumentCode
    483197
  • Title

    Design and Implementation of University Focused Crawler Based on BP Network Classifier

  • Author

    Jiang, Hua ; Han, Bing ; Ying Lin ; Dan Zuo ; Yong Xing Ge

  • Author_Institution
    Comput. Sch., Northeast Normal Univ., Changchun
  • fYear
    2009
  • fDate
    23-25 Jan. 2009
  • Firstpage
    44
  • Lastpage
    47
  • Abstract
    The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. Crawling the Web quickly and entirely is an expensive, unrealistic goal because of the required hardware and network resources. A focused crawler is an agent that targets a particular topic and visits and gathers only a relevant, narrow Web segment while trying not to waste resources on irrelevant material. It can be used to build domain-specific Web search portals and online personalized search tools. In this paper, we describe the design and implementation of a university focused crawler that runs on BP network classifier for prediction of the links leading to relevant pages. We present the flow of the system, discuss the performance, report the experimental results based on it. Our experiments show that the BP classifier performs very well in obtaining accurate relevant university Web resources.
  • Keywords
    Internet; backpropagation; online front-ends; pattern classification; search engines; BP network classifier; World-Wide Web; domain-specific Web search portal; online personalized search tool; university focused crawler; Computer networks; Crawlers; Data mining; Educational institutions; Hardware; Portals; Search engines; Uniform resource locators; Waste materials; Web pages; BP network; Crawler; Web resources; domain specific; search engines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge Discovery and Data Mining, 2009. WKDD 2009. Second International Workshop on
  • Conference_Location
    Moscow
  • Print_ISBN
    978-0-7695-3543-2
  • Type

    conf

  • DOI
    10.1109/WKDD.2009.77
  • Filename
    4771874