• DocumentCode
    1560564
  • Title

    A method for compressing lexicons

  • Author

    Ristov, Strahil ; Laporte, Eric

  • Author_Institution
    Boskovic (R.) Inst., Zagreb, Croatia
  • fYear
    2002
  • fDate
    6/24/1905 12:00:00 AM
  • Firstpage
    470
  • Abstract
    Summary form only given. Lexicon lookup is an essential part of almost every natural language processing system. A natural language lexicon is a set of strings where each string consists of a word and the associated linguistic data. Its computer representation is a structure that returns appropriate linguistic data on a given input word. It should be small and fast. We propose a method for lexicon compression based on a very efficient trie compression method and the inverted file paradigm. The method was applied on a 664000 string, 18 Mbyte, French phonetic and grammatical electronic dictionary for spelling-to-phonetics conversion. Entries in the lexicon are strings consisting of a word, its phonetic transcription, and some additional codes.
  • Keywords
    data compression; dictionaries; natural languages; speech processing; string matching; tree data structures; French dictionary; grammatical electronic dictionary; inverted file paradigm; lexicon compression; lexicon lookup; linguistic data structure; natural language processing system; phonetic electronic dictionary; spelling-to-phonetics conversion; strings; trie compression; Application software; Automata; Data compression; Dictionaries; Natural language processing; Natural languages; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2002. Proceedings. DCC 2002
  • ISSN
    1068-0314
  • Print_ISBN
    0-7695-1477-4
  • Type

    conf

  • DOI
    10.1109/DCC.2002.1000013
  • Filename
    1000013