• DocumentCode
    740016
  • Title

    A Preadapted Universal Switch Distribution for Testing Hilberg’s Conjecture

  • Author

    Debowski, Lukasz

  • Author_Institution
    , Institute of Computer Science, Polish Academy of Sciences, Warszaw, Poland
  • Volume
    61
  • Issue
    10
  • fYear
    2015
  • Firstpage
    5708
  • Lastpage
    5715
  • Abstract
    Hilberg’s conjecture about natural language states that the mutual information between two adjacent long blocks of text grows like a power of the block length. The exponent in this statement can be upper bounded using the pointwise mutual information estimate computed for a carefully chosen code. The bound is the better, the lower the compression rate is, but there is a requirement that the code be universal. So as to improve a received upper bound for Hilberg’s exponent, in this paper, we introduce two novel universal codes, called the plain switch distribution and the preadapted switch distribution. Generally speaking, switch distributions are certain mixtures of adaptive Markov chains of varying orders with some additional communication to avoid the so-called catch-up phenomenon. The advantage of these distributions is that they both achieve a low compression rate and are guaranteed to be universal. Using the switch distributions, we obtain that a sample of a text in English is non-Markovian with Hilberg’s exponent being ≤0.83, which improves over the previous bound ≤0.94 obtained using the Lempel–Ziv code.
  • Keywords
    Adaptation models; Markov processes; Mutual information; Natural languages; Q measurement; Random variables; Switches; Hilberg’s conjecture; Hilberg???s conjecture; Universal coding; natural language; universal coding;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.2015.2466693
  • Filename
    7185415