• DocumentCode
    65743
  • Title

    On Set Size Distribution Estimation and the Characterization of Large Networks via Sampling

  • Author

    Murai, F. ; Ribeiro, Bernardete ; Towsley, Don ; Pinghui Wang

  • Author_Institution
    Sch. of Sci., Univ. of Massachusetts Amherst, Amherst, MA, USA
  • Volume
    31
  • Issue
    6
  • fYear
    2013
  • fDate
    Jun-13
  • Firstpage
    1017
  • Lastpage
    1025
  • Abstract
    In this work we study the set size distribution estimation problem, where elements are randomly sampled from a collection of non-overlapping sets and we seek to recover the original set size distribution from the samples. This problem has applications to capacity planning and network theory. Examples of real-world applications include characterizing in-degree distributions in large graphs and uncovering TCP/IP flow size distributions on the Internet. We demonstrate that it is difficult to estimate the original set size distribution. The recoverability of original set size distributions presents a sharp threshold with respect to the fraction of elements that remain in the sets. If this fraction lies below the threshold, typically half of the elements in power-law and heavier-than-exponential-tailed distributions, then the original set size distribution is unrecoverable. We also discuss practical implications of our findings.
  • Keywords
    Internet; telecommunication network planning; transport protocols; Internet; TCP/IP flow size distributions; capacity planning; network theory; nonoverlapping sets; original set size distribution; set size distribution estimation problem; sharp threshold; Cram'er-Rao lower bound; Fisher information; set size distribution estimation;
  • fLanguage
    English
  • Journal_Title
    Selected Areas in Communications, IEEE Journal on
  • Publisher
    ieee
  • ISSN
    0733-8716
  • Type

    jour

  • DOI
    10.1109/JSAC.2013.130604
  • Filename
    6517106