Title :
Can web pages be classified using anonymized TCP/IP headers?
Author :
Sanders, Sean ; Kaur, Jasleen
Author_Institution :
Univ. of North Carolina at Chapel Hill, Chapel Hill, NC, USA
fDate :
April 26 2015-May 1 2015
Abstract :
Web page classification is useful in many domains- including ad targeting, traffic modeling, and intrusion detection. In this paper, we investigate whether learning-based techniques can be used to classify web pages based only on anonymized TCP/IP headers of traffic generated when a web page is visited. We do this in three steps. First, we select informative TCP/IP features for a given downloaded web page, and study which of these remain stable over time and are also consistent across client browser platforms. Second, we use the selected features to evaluate four different labeling schemes and learning-based classification methods for web page classification. Lastly, we empirically study the effectiveness of the classification methods for real-world applications.
Keywords :
Web sites; online front-ends; security of data; telecommunication traffic; transport protocols; TCP/IP header; Web page classification; ad targeting; client browser platforms; intrusion detection; labeling schemes; learning-based classification methods; learning-based techniques; traffic modeling; Browsers; Feature extraction; IP networks; Labeling; Navigation; Streaming media; Web pages; Traffic Classification; Web Page Measurement;
Conference_Titel :
Computer Communications (INFOCOM), 2015 IEEE Conference on
Conference_Location :
Kowloon
DOI :
10.1109/INFOCOM.2015.7218614