DocumentCode :
3012188
Title :
On the design of an effective corpus for evaluation of Bengali Text Compression Schemes
Author :
Islam, Md Rafiqul ; Rajon, S. A Ahsan
Author_Institution :
Comput. Sci. & Eng. Discipline, Khulna Univ., Khulna
fYear :
2008
fDate :
24-27 Dec. 2008
Firstpage :
236
Lastpage :
241
Abstract :
In this paper, we propose an effective platform for evaluation of Bengali text compression schemes. We perform a methodical study on the formulation-approaches of text corpus for data compression and present an effective corpus named Ekushe-Khul for evaluating the Bengali text compression schemes, which is the first initiative in the context of Bengali text compression. To design the Bengali text compression corpus, we consider type to token ratio as the selection criteria with a number of secondary considerations. This paper also presents a mathematical analysis on data compression performance with structural aspects of corpora. The proposed corpus is effective for evaluating compression efficiency of small and middle sized text files.
Keywords :
data compression; natural languages; text analysis; Bengali text compression scheme evaluation; corpus design; mathematical analysis; type-to-token ratio; Computer science; Costs; Data compression; Design engineering; Dictionaries; Image coding; Information technology; Mathematical analysis; Performance analysis; Performance evaluation; Bengali Text; Bengali Text Compression; Compression Efficiency; Corpus; Data Management; Dictionary Coding; Evaluation Platform; Type to Token Ratio (TTR);
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Technology, 2008. ICCIT 2008. 11th International Conference on
Conference_Location :
Khulna
Print_ISBN :
978-1-4244-2135-0
Electronic_ISBN :
978-1-4244-2136-7
Type :
conf
DOI :
10.1109/ICCITECHN.2008.4802992
Filename :
4802992
Link To Document :
بازگشت