DocumentCode
124167
Title
Relevant Sources of Information Are Not Necessarily Popular Ones
Author
Noel, Romain ; Pauchet, Alexandre ; Grilheres, Bruno ; Malandain, Nicolas ; Vercouter, Laurent ; Brunessaux, Stephan
Author_Institution
LITIS / INSA de Rouen, AIRBUS DS, Val-de-Reuil, France
Volume
1
fYear
2014
fDate
11-14 Aug. 2014
Firstpage
310
Lastpage
317
Abstract
The constant growth of the Web in recent years has made more difficult the discovery of new sources of information on a given topic. This is a prominent problem for Experts in Intelligence Analysis (EIA) who are faced to the search of pages on specific and sensitive topics. Because of their lack of popularity or because they are poorly indexed due to their sensitive content, these pages are hard-to-find with traditional search engines. In this article, we describe a new Web source discovery system called DOWSER (Discovery Of Web Sources Evaluating Relevance). The goal of this system is to provide users with new sources of information related to their needs without considering the popularity of a page unlike classic Information Retrieval tools. The expected result is a balance between relevance and originality, in the sense that the wanted pages are not necessary popular. DOWSER is based on a user profile to focus its exploration of the Web in order to collect and index only related Web documents. As requests can be insufficient to express sensitive and specific needs, the user´s information needs are specified using user´s interests represented by DBPedia resources [1] and keywords, both extracted from Web pages provided by the user. A series of experiments provides an empirical evaluation of DOWSER.
Keywords
Internet; Web sites; data mining; information needs; information retrieval; search engines; DOWSER; Discovery Of Web Sources Evaluating Relevance; EIA; Web documents; Web pages; Web source discovery system; World Wide Web; information retrieval tools; information sources; intelligence analysis experts; search engines; user information needs; user interests; user profile; Crawlers; Electronic mail; Search engines; Vectors; Web pages; Focused crawling; Information Retrieval; Ranking; Semantic Web; User modelling; Web source discovery;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
Conference_Location
Warsaw
Type
conf
DOI
10.1109/WI-IAT.2014.49
Filename
6927558
Link To Document