• DocumentCode
    2429902
  • Title

    A semantic information retrieval model for focused crawling

  • Author

    Osuna-Ontiveros, Daniel ; Lopez-Arevalo, Ivan ; Sosa-Sosa, Victor

  • Author_Institution
    Inf. Technol. Lab., CINVESTAV - IPN, Tamaulipas, Mexico
  • fYear
    2011
  • fDate
    19-21 Oct. 2011
  • Firstpage
    285
  • Lastpage
    289
  • Abstract
    Nowadays, users of computers store a lot of information on the Web. For this reason, the Internet is a good place to search information on any subject. Due to the large amount of information, some users would search information on specific websites that they consider interesting (e.g. www.wikipedia.com, news sites, etc.). Traditional models represent webpages by using the frequency of terms or the structure of links in order to assign weight to terms of webpages. This paper presents a semantic information retrieval to represent specific websites. This proposal integrates text mining algorithms based on natural language processing and traditional representation models with the aim to improve the quality of webpages recovered by searching. Each webpage of the website is represented as a vector of topics, instead of a vector of terms. In a similar way, the query is represented as a vector of topics. Thus, a similarity measure can be applied over this vector and vectors of documents to retrieve the most relevant documents.
  • Keywords
    Internet; Web sites; document handling; information retrieval; natural language processing; search problems; Internet; Web sites; document vector; focused crawling; natural language processing; search information; semantic information retrieval model; text mining algorithms; web page representation; Computational modeling; Google; Information retrieval; Mathematical model; Semantics; Text mining; Vectors; Semantic Web; Semantic representation model; Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Next Generation Web Services Practices (NWeSP), 2011 7th International Conference on
  • Conference_Location
    Salamanca
  • Print_ISBN
    978-1-4577-1125-1
  • Type

    conf

  • DOI
    10.1109/NWeSP.2011.6088192
  • Filename
    6088192