DocumentCode
1560564
Title
A method for compressing lexicons
Author
Ristov, Strahil ; Laporte, Eric
Author_Institution
Boskovic (R.) Inst., Zagreb, Croatia
fYear
2002
fDate
6/24/1905 12:00:00 AM
Firstpage
470
Abstract
Summary form only given. Lexicon lookup is an essential part of almost every natural language processing system. A natural language lexicon is a set of strings where each string consists of a word and the associated linguistic data. Its computer representation is a structure that returns appropriate linguistic data on a given input word. It should be small and fast. We propose a method for lexicon compression based on a very efficient trie compression method and the inverted file paradigm. The method was applied on a 664000 string, 18 Mbyte, French phonetic and grammatical electronic dictionary for spelling-to-phonetics conversion. Entries in the lexicon are strings consisting of a word, its phonetic transcription, and some additional codes.
Keywords
data compression; dictionaries; natural languages; speech processing; string matching; tree data structures; French dictionary; grammatical electronic dictionary; inverted file paradigm; lexicon compression; lexicon lookup; linguistic data structure; natural language processing system; phonetic electronic dictionary; spelling-to-phonetics conversion; strings; trie compression; Application software; Automata; Data compression; Dictionaries; Natural language processing; Natural languages; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference, 2002. Proceedings. DCC 2002
ISSN
1068-0314
Print_ISBN
0-7695-1477-4
Type
conf
DOI
10.1109/DCC.2002.1000013
Filename
1000013
Link To Document