• DocumentCode
    2259024
  • Title

    Acquiring Korean Lexical Entry from a Raw Corpus

  • Author

    Yu, Wonhee ; Park, Kinam ; Jung, Soonyoung ; Lim, Heuiseok

  • Author_Institution
    Dept. of Comput. Sci. Educ., Korea Univ., Seoul, South Korea
  • fYear
    2010
  • fDate
    11-13 Aug. 2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    This paper proposes a computational lexical entry acquisition model based on a representation model of the mental lexicon. The proposed model acquires lexical entries from a raw corpus by unsupervised learning like human. The model is composed of full-form and morpheme acquisition modules. In the full-from acquisition module, core full-forms are automatically acquired according to the frequency and recency thresholds. In the morpheme acquisition module, a repeatedly occurring substring in different full-forms is chosen as a candidate morpheme. Then, the candidate is corroborated as a morpheme by using the entropy measure of syllables in the string. The experimental results with a Korean corpus of which size is about 16 million full-forms show that the model successively acquires major full-forms and morphemes with the precision of 100% and 99.04%, respectively.
  • Keywords
    data acquisition; entropy; natural language processing; text analysis; unsupervised learning; Korean lexical entry; computational lexical entry acquisition model; core full form; entropy measure; full from acquisition module; mental lexicon; morpheme acquisition module; raw corpus; representation model; substring; unsupervised learning; Computational modeling; Context; Dictionaries; Entropy; Guidelines; Humans; Iron;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology Convergence and Services (ITCS), 2010 2nd International Conference on
  • Conference_Location
    Cebu
  • Print_ISBN
    978-1-4244-7584-1
  • Electronic_ISBN
    978-1-4244-7584-1
  • Type

    conf

  • DOI
    10.1109/ITCS.2010.5581289
  • Filename
    5581289