DocumentCode
1796735
Title
A Software Architecture for Progressive Scanning of On-line Communities
Author
Baldoni, Roberto ; DAmore, Fabrizio ; Mecella, Massimo ; Ucci, Daniele
Author_Institution
Dipt. di Ing. Inf. Autom. e Gestionale, Cyber-Intell. & Inf. Security Center, Sapienza Univ. di Roma, Rome, Italy
fYear
2014
fDate
June 30 2014-July 3 2014
Firstpage
207
Lastpage
212
Abstract
We consider a set of on-line communities (e.g., news, blogs, Google groups, Web sites, etc.). The content of a community is continuously updated by users and such updates can be seen by users of other communities. Thus, when creating an update, a user could be influenced by one or more updates creating a semantic causal relationship among updates. This transitively will allow to trace how an information flows across communities. The paper presents a software architecture that progressively scan a set of on-line communities in order to detect such semantic causal relationships. The architecture includes a crawler, a large scale storage, a distributed indexing system and a mining system. The paper mainly focuses on crawling and indexing.
Keywords
social networking (online); software architecture; Google groups; Web sites; blogs; crawler; distributed indexing system; information flows; large scale storage; mining system; news; online communities; progressive scanning; semantic causal relationship; software architecture; update; Communities; Computer architecture; Crawlers; Data mining; Indexing; Software architecture; MapR; Nutch; On-line communities; progressive scanning;
fLanguage
English
Publisher
ieee
Conference_Titel
Distributed Computing Systems Workshops (ICDCSW), 2014 IEEE 34th International Conference on
Conference_Location
Madrid
ISSN
1545-0678
Print_ISBN
978-1-4799-4182-7
Type
conf
DOI
10.1109/ICDCSW.2014.37
Filename
6888863
Link To Document