DocumentCode
3012188
Title
On the design of an effective corpus for evaluation of Bengali Text Compression Schemes
Author
Islam, Md Rafiqul ; Rajon, S. A Ahsan
Author_Institution
Comput. Sci. & Eng. Discipline, Khulna Univ., Khulna
fYear
2008
fDate
24-27 Dec. 2008
Firstpage
236
Lastpage
241
Abstract
In this paper, we propose an effective platform for evaluation of Bengali text compression schemes. We perform a methodical study on the formulation-approaches of text corpus for data compression and present an effective corpus named Ekushe-Khul for evaluating the Bengali text compression schemes, which is the first initiative in the context of Bengali text compression. To design the Bengali text compression corpus, we consider type to token ratio as the selection criteria with a number of secondary considerations. This paper also presents a mathematical analysis on data compression performance with structural aspects of corpora. The proposed corpus is effective for evaluating compression efficiency of small and middle sized text files.
Keywords
data compression; natural languages; text analysis; Bengali text compression scheme evaluation; corpus design; mathematical analysis; type-to-token ratio; Computer science; Costs; Data compression; Design engineering; Dictionaries; Image coding; Information technology; Mathematical analysis; Performance analysis; Performance evaluation; Bengali Text; Bengali Text Compression; Compression Efficiency; Corpus; Data Management; Dictionary Coding; Evaluation Platform; Type to Token Ratio (TTR);
fLanguage
English
Publisher
ieee
Conference_Titel
Computer and Information Technology, 2008. ICCIT 2008. 11th International Conference on
Conference_Location
Khulna
Print_ISBN
978-1-4244-2135-0
Electronic_ISBN
978-1-4244-2136-7
Type
conf
DOI
10.1109/ICCITECHN.2008.4802992
Filename
4802992
Link To Document