DocumentCode :
1560564
Title :
A method for compressing lexicons
Author :
Ristov, Strahil ; Laporte, Eric
Author_Institution :
Boskovic (R.) Inst., Zagreb, Croatia
fYear :
2002
fDate :
6/24/1905 12:00:00 AM
Firstpage :
470
Abstract :
Summary form only given. Lexicon lookup is an essential part of almost every natural language processing system. A natural language lexicon is a set of strings where each string consists of a word and the associated linguistic data. Its computer representation is a structure that returns appropriate linguistic data on a given input word. It should be small and fast. We propose a method for lexicon compression based on a very efficient trie compression method and the inverted file paradigm. The method was applied on a 664000 string, 18 Mbyte, French phonetic and grammatical electronic dictionary for spelling-to-phonetics conversion. Entries in the lexicon are strings consisting of a word, its phonetic transcription, and some additional codes.
Keywords :
data compression; dictionaries; natural languages; speech processing; string matching; tree data structures; French dictionary; grammatical electronic dictionary; inverted file paradigm; lexicon compression; lexicon lookup; linguistic data structure; natural language processing system; phonetic electronic dictionary; spelling-to-phonetics conversion; strings; trie compression; Application software; Automata; Data compression; Dictionaries; Natural language processing; Natural languages; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2002. Proceedings. DCC 2002
ISSN :
1068-0314
Print_ISBN :
0-7695-1477-4
Type :
conf
DOI :
10.1109/DCC.2002.1000013
Filename :
1000013
Link To Document :
بازگشت