DocumentCode :
124149
Title :
Finding the Needle in the Haystack: Identifying Business Communities in Internet Traffic
Author :
Weigert, S. ; Hiltunen, Matti A. ; Fetzer, Christof
Author_Institution :
Tech. Univ. Dresden, Dresden, Germany
Volume :
1
fYear :
2014
fDate :
11-14 Aug. 2014
Firstpage :
175
Lastpage :
182
Abstract :
Identifying real-world business communities, e.g., Energy, finance, defense, in Internet traffic is a challenging problem but would be valuable for the construction of better in-trusion detection techniques, for example. Seed-based community detection identifies a community in a graph by iteratively adding the ´closest´ vertices to an initial set of seed-vertices which are known to belong to the community. Previous research focused on unambiguous networks, where edges describe a specific intention in a fixed domain (e.g., A ´friend´ in a social network) and tightly-knit communities whose members are better connected to each other (´close´) than to the rest of the network. However, looking at a complete day of raw Internet traffic, we found that (1) the intend of a communication is ambiguous (e.g., ad-downloads are indistinguishable from web-page downloads) and (2) real-world industries manifest themselves as loosely-coupled communities, i.e., With more edges to non-community members than to community members. We present a new seed-based community detection algorithm that provides higher precision and recall in our setting than the related work. We show that this enables the detection of loosely-knit communities using three sample industries. For instance, our solution detected 111 individual energy companies with only 6 false positives, starting from eight ISOs (Independent System Operators) and RTO (Regional Transmission Operators) in the US.
Keywords :
Internet; business data processing; graph theory; iterative methods; security of data; social networking (online); telecommunication traffic; Haystack; ISO; Internet traffic; RTO; Web-page downloads; business communities; graph community; independent system operators; intrusion detection techniques; regional transmission operators; seed-based community detection algorithm; seed-vertices; tightly-knit communities; unambiguous networks; Communities; Companies; IP networks; Industries; Internet; Ports (Computers); Social network services; Social networks; clustering; community discovery; seed sets;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
Conference_Location :
Warsaw
Type :
conf
DOI :
10.1109/WI-IAT.2014.31
Filename :
6927540
Link To Document :
بازگشت