• DocumentCode
    2945582
  • Title

    Improving PPM Algorithm Using Dictionaries

  • Author

    Yichuan Hu ; Jianzhong Zhang ; Farooq Khan ; Ying Li

  • Author_Institution
    Dept. of ESE, Univ. of Pennsylvania, Philadelphia, PA, USA
  • fYear
    2011
  • fDate
    29-31 March 2011
  • Firstpage
    459
  • Lastpage
    459
  • Abstract
    We propose a method to improve traditional character-based PPM text compression algorithm for natural languages. Consider a text file as a sequence of alternating words and non-words, the basic idea of our algorithm is to encode non words and prefixes of words using character-based context models and encode suffixes of words using dictionary models. By using dictionary models, the algorithm can encode multiple characters as a whole, and thus enhance the compression efficiency. The advantages of the proposed algorithm are: 1) it does not require any text preprocessing; 2) it does not need any explicit codeword to identify switch between context and dictionary models; 3) it can be applied to any character-based PPM algorithms without incurring much additional computational cost. Details about the algorithm are described below.
  • Keywords
    data compression; dictionaries; natural language processing; text analysis; alternating words; character based PPM text compression algorithm; character based context models; dictionary models; natural languages; non words; words suffixes; Computational modeling; Context; Context modeling; Data compression; Decoding; Dictionaries; Encoding; Dictionary model; Markov model; PPM; Text compression!!;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference (DCC), 2011
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    978-1-61284-279-0
  • Type

    conf

  • DOI
    10.1109/DCC.2011.63
  • Filename
    5749516