DocumentCode :
2535659
Title :
Feature-based Thai unknown word boundary identification using Winnow
Author :
Charoenpornsawat, Paham ; Kijsirikul, Boongerm ; Meknavin, Surapant
Author_Institution :
Dept. of Comput. Eng., Chulalongkorn Univ., Bangkok, Thailand
fYear :
1998
fDate :
24-27 Nov 1998
Firstpage :
547
Lastpage :
550
Abstract :
This paper addresses the problem of Thai unknown word boundary identification. Unknown words are becoming the main problem in many tasks of natural language processing such as word segmentation information retrieval and part of speech tagging, etc.. In Thai, as words are written consecutively without delimiters, finding an unknown word boundary is difficult. We proposed a feature-based approach to identify Thai unknown word boundary. A feature can be anything that tests for specific information in context around the target unknown words. To automatically extract features from a training corpus, we used a machine learning algorithm, namely Winnow
Keywords :
feature extraction; information retrieval; learning (artificial intelligence); natural languages; speech processing; Thai unknown word boundary identification; Winnow; feature-based approach; machine learning algorithm; natural language processing; speech tagging; target unknown words; training corpus; word segmentation information retrieval; Character generation; Data mining; Dictionaries; Feature extraction; Machine learning algorithms; Natural language processing; Natural languages; Speech processing; Tagging; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Circuits and Systems, 1998. IEEE APCCAS 1998. The 1998 IEEE Asia-Pacific Conference on
Conference_Location :
Chiangmai
Print_ISBN :
0-7803-5146-0
Type :
conf
DOI :
10.1109/APCCAS.1998.743878
Filename :
743878
Link To Document :
بازگشت