• DocumentCode
    1666167
  • Title

    Research on a dynamic adjust crawling algorithm for guiding the topic crawler through Tunnels

  • Author

    Xu, Chang ; Jian-guo, Xu ; Bin, Jia

  • Author_Institution
    College of Information and Engineering Shan Dong University of Science and Technology Qingdao, China
  • fYear
    2011
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    The problem of Tunnels is always the focus of topic crawler. Based on the study of VSM, the paper added the impact of the text structure of web documents to the topic similarity, improved VSM text classification algorithm to make the prediction more accurate, and applied to the dynamic adjustment topic crawler algorithm through the tunnel. By analyzing the influence by features of Web Community and tunneling, taking the genetic factors of parent page and child pages into account, applied to the web page similarity calculation. In order to improve the shortcomings of the traditional tunnel method, this paper designed a new algorithm to make crawler dynamically adjust the K values according to the corresponding calculated strategy during crawling the pages, Making Web Community and tunnels to form a relatively complete thematic clusters to improve the web crawl rate.
  • Keywords
    Classification algorithms; Communities; Crawlers; Educational institutions; Heuristic algorithms; Prediction algorithms; Text categorization; Topic crawler; Topic similarity; Turnning; VSM; Web Community;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    E -Business and E -Government (ICEE), 2011 International Conference on
  • Conference_Location
    Shanghai, China
  • Print_ISBN
    978-1-4244-8691-5
  • Type

    conf

  • DOI
    10.1109/ICEBEG.2011.5884527
  • Filename
    5884527