• DocumentCode
    2879700
  • Title

    All-Character Index Dictionary

  • Author

    Yin, Wensheng ; Guo, Feifei

  • Author_Institution
    Sch. of Mech. Sci. & Eng., Huazhong Univ. of Sci. & Technol., Wuhan, China
  • fYear
    2009
  • fDate
    19-20 Dec. 2009
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    The design of dictionary index structure is the base of Chinese information processing and its properties will influence the effect of Chinese word segmentation greatly. In this paper, a Hash table is firstly established for all commonly-used characters, so that all characters in each word could be found quickly; then for each character, the number of word and their composition relationship in the word consisting of the character are recorded in the word chain to form the all-character index structure; next, the paper discusses the construction and maintenance methods of the dictionary and presents the dictionary constructing, adding and deleting algorithms. Finally a Chinese word segmentation algorithm based on all-character index dictionary is proposed and some comparisons with traditional dictionary in dictionary construction, query speed and function are made.
  • Keywords
    dictionaries; indexing; word processing; Chinese information processing; Chinese word segmentation; all-character index dictionary; construction methods; dictionary adding algorithms; dictionary constructing algorithms; dictionary deleting algorithms; dictionary index structure; maintenance methods; Communication standards; Design engineering; Dictionaries; Encyclopedias; Explosions; Indexing; Information processing; Mechanical factors; Natural languages; Shape; Chinese word segmentation; all-character index; dictionary; first-character index; index;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Engineering and Computer Science, 2009. ICIECS 2009. International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-4994-1
  • Type

    conf

  • DOI
    10.1109/ICIECS.2009.5367176
  • Filename
    5367176