DocumentCode
2259024
Title
Acquiring Korean Lexical Entry from a Raw Corpus
Author
Yu, Wonhee ; Park, Kinam ; Jung, Soonyoung ; Lim, Heuiseok
Author_Institution
Dept. of Comput. Sci. Educ., Korea Univ., Seoul, South Korea
fYear
2010
fDate
11-13 Aug. 2010
Firstpage
1
Lastpage
6
Abstract
This paper proposes a computational lexical entry acquisition model based on a representation model of the mental lexicon. The proposed model acquires lexical entries from a raw corpus by unsupervised learning like human. The model is composed of full-form and morpheme acquisition modules. In the full-from acquisition module, core full-forms are automatically acquired according to the frequency and recency thresholds. In the morpheme acquisition module, a repeatedly occurring substring in different full-forms is chosen as a candidate morpheme. Then, the candidate is corroborated as a morpheme by using the entropy measure of syllables in the string. The experimental results with a Korean corpus of which size is about 16 million full-forms show that the model successively acquires major full-forms and morphemes with the precision of 100% and 99.04%, respectively.
Keywords
data acquisition; entropy; natural language processing; text analysis; unsupervised learning; Korean lexical entry; computational lexical entry acquisition model; core full form; entropy measure; full from acquisition module; mental lexicon; morpheme acquisition module; raw corpus; representation model; substring; unsupervised learning; Computational modeling; Context; Dictionaries; Entropy; Guidelines; Humans; Iron;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology Convergence and Services (ITCS), 2010 2nd International Conference on
Conference_Location
Cebu
Print_ISBN
978-1-4244-7584-1
Electronic_ISBN
978-1-4244-7584-1
Type
conf
DOI
10.1109/ITCS.2010.5581289
Filename
5581289
Link To Document