• DocumentCode
    747525
  • Title

    Average profile and limiting distribution for a phrase size in the Lempel-Ziv parsing algorithm

  • Author

    Louchard, Guy ; Szpankowski, Wojciech

  • Author_Institution
    Lab. d´´Inf. Theorique, Univ. Libre de Bruxelles, Belgium
  • Volume
    41
  • Issue
    2
  • fYear
    1995
  • fDate
    3/1/1995 12:00:00 AM
  • Firstpage
    478
  • Lastpage
    488
  • Abstract
    Consider the parsing algorithm developed by Lempel and Ziv (1978) that partitions a sequence of length n into variable phrases (blocks) such that a new block is the shortest substring not seen in the past as a phrase. In practice, the following parameters are of interest: number of phrases, the size of a phrase, the number of phrases of given size, and so forth. In this paper, we focus on the size of a randomly selected phrase, and the average number of phrases of a given size (the so-called average profile of phrase sizes). These parameters can be efficiently analyzed through a digital search tree representation. For a memoryless source with unequal probabilities of symbols generation (the so-called asymmetric Bernoulli model), we prove that the size of a typical phrase is asymptotically normally distributed with mean and variance explicitly computed. In terms of digital search trees, we prove the normal limiting distribution of the typical depth (i.e., the length of a path from the root to a randomly selected node). The latter finding is proved by a technique that belongs to the toolkit of the “analytical analysis of algorithms”, and it seems to be novel in the context of data compression
  • Keywords
    binary sequences; grammars; random processes; search problems; tree searching; trees (mathematics); Lempel-Ziv parsing algorithm; algorithms; analytical analysis; asymmetric Bernoulli model; asymptotic normal distribution; average profile; data compression; digital search tree representation; limiting distribution; mean; memoryless source; randomly selected phrase; unequal symbol generation probabilities; variable phrases; variance; Algorithm design and analysis; Application software; Computer science; Data compression; Distributed computing; Heart; Helium; Partitioning algorithms; Testing;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/18.370149
  • Filename
    370149