• DocumentCode
    2404875
  • Title

    Data mining meets performance evaluation: fast algorithms for modeling bursty traffic

  • Author

    Wang, Mengzhi ; Madhyastha, Tara ; Chan, Ngai Hang ; Papadimitriou, Spiros ; Faloutsos, Christos

  • Author_Institution
    Dept. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    507
  • Lastpage
    516
  • Abstract
    Network, Web, and disk I/O traffic are usually bursty and self-similar and therefore cannot be modeled adequately with Poisson arrivals. However, we wish to model these types of traffic and generate realistic traces, because of obvious applications for disk scheduling, network management, and Web server design. Previous models (like fractional Brownian motion and FARIMA, etc.) tried to capture the ´burstiness´. However, the proposed models either require too many parameters to fit and/or require prohibitively large (quadratic) time to generate large traces. We propose a simple, parsimonious method, the b-model, which solves both problems: it requires just one parameter, and can easily generate large traces. In addition, it has many more attractive properties: (a) with our proposed estimation algorithm, it requires just a single pass over the actual trace to estimate b. For example, a one-day-long disk trace in milliseconds contains about 86 Mb data points and requires about 3 minutes for model fitting and 5 minutes for generation. (b) The resulting synthetic traces are very realistic: our experiments on real disk and Web traces show that our synthetic traces match the real ones very well in terms of queuing behavior
  • Keywords
    Internet; data mining; input-output programs; performance evaluation; sequences; telecommunication traffic; Web server design; Web traffic; b-model; bursty traffic modeling; data mining; disk I/O traffic; disk scheduling; estimation algorithm; fast algorithms; network management; network traffic; performance evaluation; queuing behavior; traces; Brownian motion; Computer science; Data mining; Ethernet networks; Parameter estimation; Scheduling algorithm; Statistics; Telecommunication traffic; Traffic control; Web server;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2002. Proceedings. 18th International Conference on
  • Conference_Location
    San Jose, CA
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-1531-2
  • Type

    conf

  • DOI
    10.1109/ICDE.2002.994770
  • Filename
    994770