Relevant Sources of Information Are Not Necessarily Popular Ones

Author

Noel, Romain ; Pauchet, Alexandre ; Grilheres, Bruno ; Malandain, Nicolas ; Vercouter, Laurent ; Brunessaux, Stephan

Author_Institution

LITIS / INSA de Rouen, AIRBUS DS, Val-de-Reuil, France

Volume

1

fYear

2014

fDate

11-14 Aug. 2014

Firstpage

310

Lastpage

317

Abstract

The constant growth of the Web in recent years has made more difficult the discovery of new sources of information on a given topic. This is a prominent problem for Experts in Intelligence Analysis (EIA) who are faced to the search of pages on specific and sensitive topics. Because of their lack of popularity or because they are poorly indexed due to their sensitive content, these pages are hard-to-find with traditional search engines. In this article, we describe a new Web source discovery system called DOWSER (Discovery Of Web Sources Evaluating Relevance). The goal of this system is to provide users with new sources of information related to their needs without considering the popularity of a page unlike classic Information Retrieval tools. The expected result is a balance between relevance and originality, in the sense that the wanted pages are not necessary popular. DOWSER is based on a user profile to focus its exploration of the Web in order to collect and index only related Web documents. As requests can be insufficient to express sensitive and specific needs, the user´s information needs are specified using user´s interests represented by DBPedia resources [1] and keywords, both extracted from Web pages provided by the user. A series of experiments provides an empirical evaluation of DOWSER.

Keywords

Internet; Web sites; data mining; information needs; information retrieval; search engines; DOWSER; Discovery Of Web Sources Evaluating Relevance; EIA; Web documents; Web pages; Web source discovery system; World Wide Web; information retrieval tools; information sources; intelligence analysis experts; search engines; user information needs; user interests; user profile; Crawlers; Electronic mail; Search engines; Vectors; Web pages; Focused crawling; Information Retrieval; Ranking; Semantic Web; User modelling; Web source discovery;

fLanguage

English

Publisher

ieee

Conference_Titel

Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on

Conference_Location

Warsaw

Type

conf

DOI

10.1109/WI-IAT.2014.49

Filename

6927558