DocumentCode :
1972384
Title :
Digging up social structures from documents on the web
Author :
Gessiou, E. ; Volanis, S. ; Athanasopoulos, Elias ; Markatos, Evangelos P. ; Ioannidis, Sotiris
Author_Institution :
Polytech. Inst. of New York Univ., Brooklyn, NY, USA
fYear :
2012
fDate :
3-7 Dec. 2012
Firstpage :
744
Lastpage :
750
Abstract :
We collected more than ten million Microsoft Office documents from public websites, analyzed the metadata stored in each document and extracted information related to social activities. Our analysis revealed the existence of exactly identified cliques of users that edit, revise and collaborate on industrial and military content. We also examined cliques in documents downloaded from Fortune-500 company websites. We constructed their graphs and measured their properties. The graphs contained many connected components and presented social properties. The a priori knowledge of a company´s social graph may significantly assist an adversary to launch targeted attacks, such as targeted advertisements and phishing emails. Our study demonstrates the privacy risks associated with metadata by cross-correlating all members identified in a clique with users of Twitter. We show that it is possible to match authors collaborating in the creation of a document with Twitter accounts. To the best of our knowledge, this study is the first to identify individuals and create social cliques solely based on information derived from document metadata. Our study raises major concerns about the risks involved in privacy leakage due to document metadata.
Keywords :
data privacy; document handling; graph theory; meta data; social networking (online); Fortune-500 company Websites; Microsoft Office documents; Twitter accounts; company social graph; document metadata; information extraction; metadata analysis; phishing emails; privacy leakage; privacy risks; public Websites; social activities; social cliques; social properties; social structures; targeted advertisements;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Global Communications Conference (GLOBECOM), 2012 IEEE
Conference_Location :
Anaheim, CA
ISSN :
1930-529X
Print_ISBN :
978-1-4673-0920-2
Electronic_ISBN :
1930-529X
Type :
conf
DOI :
10.1109/GLOCOM.2012.6503202
Filename :
6503202
Link To Document :
بازگشت