DocumentCode
3753602
Title
Unsupervised Detection of Web Trackers
Author
Hassan Metwalley;Stefano Traverso;Marco Mellia
Author_Institution
Politec. di Torino, Turin, Italy
fYear
2015
Firstpage
1
Lastpage
6
Abstract
When browsing, users are consistently tracked by parties whose business builds on the value of collected data. The privacy implications are serious. Consumers and corporates do worry about the information they unknowingly expose to the outside world, and they claim for mechanisms to curb this leakage. Existing countermeasures to web tracking either base on hostname blacklists whose origin is impossible to know and must be continuously updated. This paper presents a novel, unsupervised methodology that leverages application-level traffic logs to automatically detect services running some tracking activity, thus enabling the generation of curated blacklists. The methodology builds on an algorithm that pinpoints pieces of information containing user identifiers exposed in URL queries in HTTP(S) transactions. We validate our algorithm over an artificial dataset obtained by visiting the top 200 most popular websites in the Alexa rank. Results are excellent. Our algorithm identifies 34 new third- party trackers not present in available blacklists. By analyzing the output of our algorithm, some privacy-related interactions emerge. For instance, we observe scenarios clearly hinting to Cookie Matching practice, for which information about users´ activity gets shared across several different third-parties.
Keywords
"Browsers","Algorithm design and analysis","Uniform resource locators","Privacy","Target tracking","Companies"
Publisher
ieee
Conference_Titel
Global Communications Conference (GLOBECOM), 2015 IEEE
Type
conf
DOI
10.1109/GLOCOM.2015.7417499
Filename
7417499
Link To Document