• DocumentCode
    3420845
  • Title

    On the complexity of optimal grammar-based compression

  • Author

    Arpe, Jan ; Reischuk, Rüdiger

  • Author_Institution
    Institut fur Theor. Informatik, Universitat zu Lubeck, Germany
  • fYear
    2006
  • fDate
    28-30 March 2006
  • Firstpage
    173
  • Lastpage
    182
  • Abstract
    Given a string, the task of grammar-based compression is to find a small context-free grammar that generates exactly that string. We investigate the relationship between grammar-based compression of strings over unbounded and bounded alphabets. Specifically, we show how to transform a grammar for a string over an unbounded alphabet into a grammar for a block coding of that string over a fixed bounded alphabet and vice versa. From these constructions, we obtain asymptotically tight relationships between the minimum grammar sizes for strings and their block codings. Furthermore, we exploit an improved bound of our construction for overlap-free block codings to show that a polynomial time algorithm for approximating the minimum grammar for binary strings within a factor of c yields a polynomial time algorithm for approximating the minimum grammar for strings over arbitrary alphabets within a factor of 24c + ∈ (for arbitrary ∈ > 0). Currently, the latter problem is known to be NP-hard to approximate within a factor of 8569/8568. Since there is some hope to prove a nonconstant lower bound, our results may provide a first step towards solving the long standing open question whether minimum grammar-based compression of binary strings is NP-complete.
  • Keywords
    block codes; computational complexity; context-free grammars; data compression; NP-complete problem; NP-hard problem; binary strings; context-free grammar; nonconstant lower bound; optimal grammar-based compression; overlap-free block codings; polynomial time algorithm; unbounded alphabet; Approximation algorithms; Block codes; Data compression; Greedy algorithms; Polynomials; Production; Size measurement; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2006. DCC 2006. Proceedings
  • ISSN
    1068-0314
  • Print_ISBN
    0-7695-2545-8
  • Type

    conf

  • DOI
    10.1109/DCC.2006.59
  • Filename
    1607252