• DocumentCode
    1594277
  • Title

    Applying semantic similarity measures to enhance topic-specific web crawling

  • Author

    Pesaranghader, Ahmad ; Mustapha, Norwati ; Pesaranghader, Ahmad

  • Author_Institution
    Fac. of Comput. Sci. & Inf. Technol., Univ. Putra Malaysia, Serdang, Malaysia
  • fYear
    2013
  • Firstpage
    205
  • Lastpage
    212
  • Abstract
    As the Internet grows rapidly, finding desirable information becomes a tedious and time consuming task. Topic-specific web crawlers, as utopian solutions, tackle this issue through traversing the Web and collecting information related to the topic of interest. In this regard, various methods are proposed. Nevertheless, they hardly consider desired sense of the given topic which would certainly play an important role to find relevant web pages. In this paper, we attempt to improve topic-specific web crawling by disambiguating the sense of the topic. This would avoid crawling irrelevant links interlaced with other senses of the topic. For this purpose, by considering links hypertext semantic, we employ Lin semantic similarity measure in our crawler, named LinCrawler, to distinguish topic sense-related links from the others. Moreover, we compare LinCrawler against TFCrawler which only considers frequency of terms in hypertexts. Experimental results show LinCrawler outperforms TFCrawler to collect more relevant web pages.
  • Keywords
    Web sites; data mining; hypermedia; information retrieval; semantic Web; Internet; Lin semantic similarity measure; LinCrawler; TFCrawler; Web data mining; Web pages; information collection; information retrieval; links hypertext semantic; semantic Web; topic sense disambiguation; topic sense-related links; topic-specific Web crawling; utopian solutions; Google; Internet; Iris; Search engines; Semantics; Unified modeling language; Vectors; Information Retrieval; Link Prediction; Semantic Web; Topic-Specific Web Crawling; Web Data Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems Design and Applications (ISDA), 2013 13th International Conference on
  • Conference_Location
    Bangi
  • Print_ISBN
    978-1-4799-3515-4
  • Type

    conf

  • DOI
    10.1109/ISDA.2013.6920736
  • Filename
    6920736