• DocumentCode
    1960063
  • Title

    Distributed Ontology-Driven Focused Crawling

  • Author

    Campos, Rui ; Rojas, O. ; Marin, Mario ; Mendoza, M.

  • Author_Institution
    Comput. Sci. Dept., Univ. de Santiago, Santiago, Chile
  • fYear
    2013
  • fDate
    Feb. 27 2013-March 1 2013
  • Firstpage
    108
  • Lastpage
    115
  • Abstract
    Focused crawlers are programs designed to download web pages which are relevant to specific topics. Using information gathered at running time, focused crawlers explore the web following promissory hyperlinks and fetching only pages which appear to be relevant. These crawlers are receiving increasing attention because they favor the construction of vertical search engines, allowing users to focus on specific topics of information, providing higher accuracy and reducing computational costs involved in query processing. In this article, we introduce an efficient focused crawling strategy which considers a number of distributed focused crawlers which recover relevant pages to a given knowledge domain. We propose an ontology-based knowledge representation approach to drive the crawler to specific segments of the web. Experimental results with actual samples of the Web show the feasibility and efficiency of our strategy.
  • Keywords
    Internet; ontologies (artificial intelligence); query processing; search engines; Web pages; distributed ontology-driven focused crawling; efficient focused crawling strategy; fetching; ontology-based knowledge representation approach; promissory hyperlinks; query processing; vertical search engines; Crawlers; HTML; Ontologies; Search engines; Uniform resource locators; Vectors; Web pages; Focused crawling; ontologies; vertical search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel, Distributed and Network-Based Processing (PDP), 2013 21st Euromicro International Conference on
  • Conference_Location
    Belfast
  • ISSN
    1066-6192
  • Print_ISBN
    978-1-4673-5321-2
  • Electronic_ISBN
    1066-6192
  • Type

    conf

  • DOI
    10.1109/PDP.2013.23
  • Filename
    6498540