• DocumentCode
    1796735
  • Title

    A Software Architecture for Progressive Scanning of On-line Communities

  • Author

    Baldoni, Roberto ; DAmore, Fabrizio ; Mecella, Massimo ; Ucci, Daniele

  • Author_Institution
    Dipt. di Ing. Inf. Autom. e Gestionale, Cyber-Intell. & Inf. Security Center, Sapienza Univ. di Roma, Rome, Italy
  • fYear
    2014
  • fDate
    June 30 2014-July 3 2014
  • Firstpage
    207
  • Lastpage
    212
  • Abstract
    We consider a set of on-line communities (e.g., news, blogs, Google groups, Web sites, etc.). The content of a community is continuously updated by users and such updates can be seen by users of other communities. Thus, when creating an update, a user could be influenced by one or more updates creating a semantic causal relationship among updates. This transitively will allow to trace how an information flows across communities. The paper presents a software architecture that progressively scan a set of on-line communities in order to detect such semantic causal relationships. The architecture includes a crawler, a large scale storage, a distributed indexing system and a mining system. The paper mainly focuses on crawling and indexing.
  • Keywords
    social networking (online); software architecture; Google groups; Web sites; blogs; crawler; distributed indexing system; information flows; large scale storage; mining system; news; online communities; progressive scanning; semantic causal relationship; software architecture; update; Communities; Computer architecture; Crawlers; Data mining; Indexing; Software architecture; MapR; Nutch; On-line communities; progressive scanning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems Workshops (ICDCSW), 2014 IEEE 34th International Conference on
  • Conference_Location
    Madrid
  • ISSN
    1545-0678
  • Print_ISBN
    978-1-4799-4182-7
  • Type

    conf

  • DOI
    10.1109/ICDCSW.2014.37
  • Filename
    6888863