• DocumentCode
    3220394
  • Title

    A fast lossless compression algorithm for Arabic textual images

  • Author

    AlZahir, Saif

  • Author_Institution
    Image Process. & Multimedia Lab., Univ. of Northern British Columbia, Prince George, BC, Canada
  • fYear
    2011
  • fDate
    16-18 Nov. 2011
  • Firstpage
    595
  • Lastpage
    598
  • Abstract
    In recent years, an unparalleled volume of textual information was transported over the Internet via email, chatting, blogging, twittering, digital libraries, and information retrieval systems. As the volume of text data has exceeded 40% of the total volume of traffic on the Internet, compressing textual data becomes imperative. Many algorithms were introduced and employed for this purpose including Huffman encoding, arithmetic encoding, the Ziv-Lempel family, Dynamic Markov Compression, and Burrow-Wheeler Transform. In this paper, a novel algorithm for compressing textual images is presented. The algorithm comprises of two parts: (i) a fixed-to-variable codebook; and (ii) row and column reduction coding scheme, RCRC. Simulation results on a large number of Arabic textual images show that this algorithm has a compression ratio of approximately 87%, which exceeds published results including those of JBIG2.
  • Keywords
    Huffman codes; Internet; Markov processes; arithmetic codes; data compression; digital libraries; document image processing; electronic mail; information retrieval systems; natural language processing; social networking (online); text analysis; Arabic textual images; Burrow-Wheeler transform; Huffman encoding; Internet traffic; JBIG2; RCRC; Ziv-Lempel family; arithmetic encoding; blogging; chatting; column reduction coding scheme; compression ratio; digital library; dynamic Markov compression; email; fixed-to-variable codebook; information retrieval systems; lossless compression algorithm; row reduction coding scheme; text data; textual data compression; textual image compression; textual information; twittering; unparalleled volume; Algorithm design and analysis; Conferences; Entropy; Image coding; Matrix converters; Morphology; Vectors; Arabic text compression; binary image compression; entropy; written text compression;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal and Image Processing Applications (ICSIPA), 2011 IEEE International Conference on
  • Conference_Location
    Kuala Lumpur
  • Print_ISBN
    978-1-4577-0243-3
  • Type

    conf

  • DOI
    10.1109/ICSIPA.2011.6144069
  • Filename
    6144069