• DocumentCode
    2590793
  • Title

    Analyzing the Web Crawler as a Feed Forward Engine for an Efficient Solution to the Search Problem in the Minimum Amount of Time through a Distributed Framework

  • Author

    Qureshi, M. Atif ; Younus, Arjumand ; Rojas, Francisco

  • Author_Institution
    Dept. of Comput. Sci., Korea Adv. Inst. of Sci. & Technol., Daejeon, South Korea
  • fYear
    2010
  • fDate
    21-23 April 2010
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    A web crawler forms the backbone of a search engine and this backbone needs a careful re- assessment that could enhance the efficiency of search engines. This paper conducts such a re- assessment from the perspective of systems and this is achieved through implementation and analysis of a web crawler "VisionerBOT" as a feed forward engine for search engines using the MapReduce distributed programming model. Our crawler implementations revisit the classical OS debate of threads vs. events, with a significant contribution from our work which concludes that events is the ideal way forward for web crawlers. Furthermore, in implementing the feed forward mechanisms within the web crawler, we came up with some important design considerations for the operating system research community which can lead to a whole new class of operating systems.
  • Keywords
    Internet; distributed programming; operating systems (computers); search engines; MapReduce distributed programming model; VisionerBOT; Web crawler analysis; distributed framework; feed forward engine mechanism; operating system; search engine; search problem; Crawlers; Feeds; Internet; Operating systems; Search engines; Search problems; Service oriented architecture; Spine; Web server; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Applications (ICISA), 2010 International Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4244-5941-4
  • Electronic_ISBN
    978-1-4244-5943-8
  • Type

    conf

  • DOI
    10.1109/ICISA.2010.5480411
  • Filename
    5480411