• DocumentCode
    2069898
  • Title

    Web Page´s Blocks Based Topical Crawler

  • Author

    Zhang, Weifeng ; Xu, Baowen ; Lu, Hong

  • Author_Institution
    Coll. of Comput., Nanjing Univ. of Posts & Telecommun., Nanjing, China
  • fYear
    2008
  • fDate
    18-19 Dec. 2008
  • Firstpage
    44
  • Lastpage
    49
  • Abstract
    Link context has been widely used in information retrieval and classification. In topical crawlers or vertical crawlers, the link contexts are used to forecast whether the links are related to topics. The context of a link or link context usually includes the anchor text of the link, the whole web page text or the words in the fixed scope near the link. The entire text of the page often contains too many themes, anchor text is too simple, and the scope of fixed windows is not easy to determine. In this paper, we propose to decide the scope of link context by the web page block technology. The links in the same block are more closely related. The corner classification based neural network is used to represent and filter the topics. Our experiments show that web crawlers using web page block based link context have better accuracy, and that the corner classification neural network is suitable for representing and filtering topics.
  • Keywords
    Web sites; neural nets; Web crawlers; Web page block; corner classification; link context; neural network; topical crawler; Artificial neural networks; Biological neural networks; Crawlers; Educational institutions; Information filtering; Information filters; Information retrieval; Neural networks; Search engines; Web pages; crawler; topic; web page block;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Service-Oriented System Engineering, 2008. SOSE '08. IEEE International Symposium on
  • Conference_Location
    Jhongli
  • Print_ISBN
    978-0-7695-3499-2
  • Electronic_ISBN
    978-0-7695-3499-2
  • Type

    conf

  • DOI
    10.1109/SOSE.2008.10
  • Filename
    4730461