DocumentCode
3220394
Title
A fast lossless compression algorithm for Arabic textual images
Author
AlZahir, Saif
Author_Institution
Image Process. & Multimedia Lab., Univ. of Northern British Columbia, Prince George, BC, Canada
fYear
2011
fDate
16-18 Nov. 2011
Firstpage
595
Lastpage
598
Abstract
In recent years, an unparalleled volume of textual information was transported over the Internet via email, chatting, blogging, twittering, digital libraries, and information retrieval systems. As the volume of text data has exceeded 40% of the total volume of traffic on the Internet, compressing textual data becomes imperative. Many algorithms were introduced and employed for this purpose including Huffman encoding, arithmetic encoding, the Ziv-Lempel family, Dynamic Markov Compression, and Burrow-Wheeler Transform. In this paper, a novel algorithm for compressing textual images is presented. The algorithm comprises of two parts: (i) a fixed-to-variable codebook; and (ii) row and column reduction coding scheme, RCRC. Simulation results on a large number of Arabic textual images show that this algorithm has a compression ratio of approximately 87%, which exceeds published results including those of JBIG2.
Keywords
Huffman codes; Internet; Markov processes; arithmetic codes; data compression; digital libraries; document image processing; electronic mail; information retrieval systems; natural language processing; social networking (online); text analysis; Arabic textual images; Burrow-Wheeler transform; Huffman encoding; Internet traffic; JBIG2; RCRC; Ziv-Lempel family; arithmetic encoding; blogging; chatting; column reduction coding scheme; compression ratio; digital library; dynamic Markov compression; email; fixed-to-variable codebook; information retrieval systems; lossless compression algorithm; row reduction coding scheme; text data; textual data compression; textual image compression; textual information; twittering; unparalleled volume; Algorithm design and analysis; Conferences; Entropy; Image coding; Matrix converters; Morphology; Vectors; Arabic text compression; binary image compression; entropy; written text compression;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal and Image Processing Applications (ICSIPA), 2011 IEEE International Conference on
Conference_Location
Kuala Lumpur
Print_ISBN
978-1-4577-0243-3
Type
conf
DOI
10.1109/ICSIPA.2011.6144069
Filename
6144069
Link To Document