• DocumentCode
    124167
  • Title

    Relevant Sources of Information Are Not Necessarily Popular Ones

  • Author

    Noel, Romain ; Pauchet, Alexandre ; Grilheres, Bruno ; Malandain, Nicolas ; Vercouter, Laurent ; Brunessaux, Stephan

  • Author_Institution
    LITIS / INSA de Rouen, AIRBUS DS, Val-de-Reuil, France
  • Volume
    1
  • fYear
    2014
  • fDate
    11-14 Aug. 2014
  • Firstpage
    310
  • Lastpage
    317
  • Abstract
    The constant growth of the Web in recent years has made more difficult the discovery of new sources of information on a given topic. This is a prominent problem for Experts in Intelligence Analysis (EIA) who are faced to the search of pages on specific and sensitive topics. Because of their lack of popularity or because they are poorly indexed due to their sensitive content, these pages are hard-to-find with traditional search engines. In this article, we describe a new Web source discovery system called DOWSER (Discovery Of Web Sources Evaluating Relevance). The goal of this system is to provide users with new sources of information related to their needs without considering the popularity of a page unlike classic Information Retrieval tools. The expected result is a balance between relevance and originality, in the sense that the wanted pages are not necessary popular. DOWSER is based on a user profile to focus its exploration of the Web in order to collect and index only related Web documents. As requests can be insufficient to express sensitive and specific needs, the user´s information needs are specified using user´s interests represented by DBPedia resources [1] and keywords, both extracted from Web pages provided by the user. A series of experiments provides an empirical evaluation of DOWSER.
  • Keywords
    Internet; Web sites; data mining; information needs; information retrieval; search engines; DOWSER; Discovery Of Web Sources Evaluating Relevance; EIA; Web documents; Web pages; Web source discovery system; World Wide Web; information retrieval tools; information sources; intelligence analysis experts; search engines; user information needs; user interests; user profile; Crawlers; Electronic mail; Search engines; Vectors; Web pages; Focused crawling; Information Retrieval; Ranking; Semantic Web; User modelling; Web source discovery;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
  • Conference_Location
    Warsaw
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2014.49
  • Filename
    6927558