DocumentCode :
258710
Title :
Entropy of Malayalam language and text compression using Huffman coding
Author :
Kuruvila, Melbin ; Gopinath, Deepa P.
Author_Institution :
Dept. of ECE, Coll. of Eng. Trivandrum, Trivandrum, India
fYear :
2014
fDate :
17-18 Dec. 2014
Firstpage :
150
Lastpage :
155
Abstract :
Entropy is a statistical parameter which measures how much information is produced on the average for each letter of a text in the language. Every language normally has certain hidden statistically significant features and certain redundancy. These features can be utilized to form a suitable text compression tool for the optimum use of resources. Being motivated by the language studies of English and other languages based on Shannon theory, an informational analysis of Malayalam language text is done in this paper. Entropy of Malayalam language is calculated and is obtained as 4.8 bits per character. The Malayalam text compressor discussed in this paper, follows Huffman coding technique which takes both Malayalam and English alphabets along with arithmetic numbers and most probable character is represented by less number of bits. It is found that the Huffman compression algorithm achieves a compression ratio of 66 percentage for a standard Malayalam database taken. A comparison is made on compression ratio for different databases taken.
Keywords :
Huffman codes; data compression; entropy; natural language processing; statistical analysis; text analysis; English alphabets; English languages; Huffman coding technique; Huffman compression algorithm; Malayalam alphabets; Malayalam language text compressor; Shannon theory; arithmetic numbers; compression ratio; entropy; informational analysis; optimum resource use; redundancy; standard Malayalam database; statistical parameter; statistically significant features; text compression tool; text letter; Channel coding; Databases; Entropy; Huffman coding; Speech; Standards; Compression ratio; Entropy; Huffman coding; Text compression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Systems and Communications (ICCSC), 2014 First International Conference on
Conference_Location :
Trivandrum
Print_ISBN :
978-1-4799-6012-5
Type :
conf
DOI :
10.1109/COMPSC.2014.7032638
Filename :
7032638
Link To Document :
بازگشت