DocumentCode :
2259024
Title :
Acquiring Korean Lexical Entry from a Raw Corpus
Author :
Yu, Wonhee ; Park, Kinam ; Jung, Soonyoung ; Lim, Heuiseok
Author_Institution :
Dept. of Comput. Sci. Educ., Korea Univ., Seoul, South Korea
fYear :
2010
fDate :
11-13 Aug. 2010
Firstpage :
1
Lastpage :
6
Abstract :
This paper proposes a computational lexical entry acquisition model based on a representation model of the mental lexicon. The proposed model acquires lexical entries from a raw corpus by unsupervised learning like human. The model is composed of full-form and morpheme acquisition modules. In the full-from acquisition module, core full-forms are automatically acquired according to the frequency and recency thresholds. In the morpheme acquisition module, a repeatedly occurring substring in different full-forms is chosen as a candidate morpheme. Then, the candidate is corroborated as a morpheme by using the entropy measure of syllables in the string. The experimental results with a Korean corpus of which size is about 16 million full-forms show that the model successively acquires major full-forms and morphemes with the precision of 100% and 99.04%, respectively.
Keywords :
data acquisition; entropy; natural language processing; text analysis; unsupervised learning; Korean lexical entry; computational lexical entry acquisition model; core full form; entropy measure; full from acquisition module; mental lexicon; morpheme acquisition module; raw corpus; representation model; substring; unsupervised learning; Computational modeling; Context; Dictionaries; Entropy; Guidelines; Humans; Iron;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology Convergence and Services (ITCS), 2010 2nd International Conference on
Conference_Location :
Cebu
Print_ISBN :
978-1-4244-7584-1
Electronic_ISBN :
978-1-4244-7584-1
Type :
conf
DOI :
10.1109/ITCS.2010.5581289
Filename :
5581289
Link To Document :
بازگشت