Title :
Relevant Sources of Information Are Not Necessarily Popular Ones
Author :
Noel, Romain ; Pauchet, Alexandre ; Grilheres, Bruno ; Malandain, Nicolas ; Vercouter, Laurent ; Brunessaux, Stephan
Author_Institution :
LITIS / INSA de Rouen, AIRBUS DS, Val-de-Reuil, France
Abstract :
The constant growth of the Web in recent years has made more difficult the discovery of new sources of information on a given topic. This is a prominent problem for Experts in Intelligence Analysis (EIA) who are faced to the search of pages on specific and sensitive topics. Because of their lack of popularity or because they are poorly indexed due to their sensitive content, these pages are hard-to-find with traditional search engines. In this article, we describe a new Web source discovery system called DOWSER (Discovery Of Web Sources Evaluating Relevance). The goal of this system is to provide users with new sources of information related to their needs without considering the popularity of a page unlike classic Information Retrieval tools. The expected result is a balance between relevance and originality, in the sense that the wanted pages are not necessary popular. DOWSER is based on a user profile to focus its exploration of the Web in order to collect and index only related Web documents. As requests can be insufficient to express sensitive and specific needs, the user´s information needs are specified using user´s interests represented by DBPedia resources [1] and keywords, both extracted from Web pages provided by the user. A series of experiments provides an empirical evaluation of DOWSER.
Keywords :
Internet; Web sites; data mining; information needs; information retrieval; search engines; DOWSER; Discovery Of Web Sources Evaluating Relevance; EIA; Web documents; Web pages; Web source discovery system; World Wide Web; information retrieval tools; information sources; intelligence analysis experts; search engines; user information needs; user interests; user profile; Crawlers; Electronic mail; Search engines; Vectors; Web pages; Focused crawling; Information Retrieval; Ranking; Semantic Web; User modelling; Web source discovery;
Conference_Titel :
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
Conference_Location :
Warsaw
DOI :
10.1109/WI-IAT.2014.49