Title :
Morphology based text compression
Author :
Göksu, Hayriye ; Diri, Banu
Abstract :
With the rapid growth of online information, the number of documents in electronic media is very common increased. Easy and quick access to this information gets more important for the purpose of text compression. In recent years, a portion of the work in the field of text compression covers study aimed to the morphological structure of the language. In this study, Turkish and English documents are compressed in the determination of the different decomposition methods and efficiency, this method has been to investigate the effects of compression. Turkish and English documents are parsed by using morphological structure. The next stage in the parsed document structure is applied to the compression process with Huffman compression method. As a result, created 10 different parsing techniques with which attempts were made on a different corpus.
Keywords :
data compression; grammars; natural language processing; text analysis; English document; Huffman compression method; Turkish document; electronic media; morphological structure; morphology based text compression; Computers; Conferences; Data compression; Entropy; Information technology; Markov processes; Morphology;
Conference_Titel :
Signal Processing and Communications Applications Conference (SIU), 2010 IEEE 18th
Conference_Location :
Diyarbakir
Print_ISBN :
978-1-4244-9672-3
DOI :
10.1109/SIU.2010.5651231