Title :
FastCARS: fast, correlation-aware sampling for network data mining
Author :
Pan, Jia-Yu ; Seshan, Srinivasan ; Faloutsos, Christos
Author_Institution :
Dept. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
Technology trends are making it more and more difficult to observe and record the large amount of data generated by high speed links. Traffic sampling techniques provide a simple alternative that reduces the volume of data collected. Unfortunately, existing sampling techniques largely hide any temporal relationship in the recorded data. Our proposed method, "FastCARS", captures statistics naturally for packets that are 1, 2 or more steps away. It has the following properties: (a) provides accurate measurements of a full trace\´s statistics; (b) is simple and can be easily implemented; (c) captures correlations between successive packets, as well as packets that are further apart; (d) generalizes previously proposed sampling methods and includes them as special cases; (e) is scalable and flexible to account for prior knowledge about the characteristics of traces. We also propose several new tools for network data mining that use the information provided by FastCARS. The experimental results on multiple, real-world datasets (233 Mb in total), show that the proposed FastCARS sampling method and these new data mining tools are effective. With these tools, we show that the independence assumption of packet arrival is not correct, and that packet trains may not be the only cause of dependence among arrivals.
Keywords :
data communication; data mining; sampling methods; telecommunication networks; telecommunication traffic; FastCARS; correlation-aware sampling; full trace statistics; network data mining; packet trains; traffic sampling methods; Algorithm design and analysis; Computer science; Data mining; Histograms; Monitoring; Routing; Sampling methods; Statistics; Telecommunication traffic; Velocity measurement;
Conference_Titel :
Global Telecommunications Conference, 2002. GLOBECOM '02. IEEE
Print_ISBN :
0-7803-7632-3
DOI :
10.1109/GLOCOM.2002.1189013