DocumentCode
2429902
Title
A semantic information retrieval model for focused crawling
Author
Osuna-Ontiveros, Daniel ; Lopez-Arevalo, Ivan ; Sosa-Sosa, Victor
Author_Institution
Inf. Technol. Lab., CINVESTAV - IPN, Tamaulipas, Mexico
fYear
2011
fDate
19-21 Oct. 2011
Firstpage
285
Lastpage
289
Abstract
Nowadays, users of computers store a lot of information on the Web. For this reason, the Internet is a good place to search information on any subject. Due to the large amount of information, some users would search information on specific websites that they consider interesting (e.g. www.wikipedia.com, news sites, etc.). Traditional models represent webpages by using the frequency of terms or the structure of links in order to assign weight to terms of webpages. This paper presents a semantic information retrieval to represent specific websites. This proposal integrates text mining algorithms based on natural language processing and traditional representation models with the aim to improve the quality of webpages recovered by searching. Each webpage of the website is represented as a vector of topics, instead of a vector of terms. In a similar way, the query is represented as a vector of topics. Thus, a similarity measure can be applied over this vector and vectors of documents to retrieve the most relevant documents.
Keywords
Internet; Web sites; document handling; information retrieval; natural language processing; search problems; Internet; Web sites; document vector; focused crawling; natural language processing; search information; semantic information retrieval model; text mining algorithms; web page representation; Computational modeling; Google; Information retrieval; Mathematical model; Semantics; Text mining; Vectors; Semantic Web; Semantic representation model; Web search;
fLanguage
English
Publisher
ieee
Conference_Titel
Next Generation Web Services Practices (NWeSP), 2011 7th International Conference on
Conference_Location
Salamanca
Print_ISBN
978-1-4577-1125-1
Type
conf
DOI
10.1109/NWeSP.2011.6088192
Filename
6088192
Link To Document