Title :
BEST 2009 : Thai word segmentation software contest
Author :
Kosawat, Krit ; Boriboon, Monthika ; Chootrakool, Patcharika ; Chotimongkol, Ananlada ; Klaithin, Supon ; Kongyoung, Sarawoot ; Kriengket, Kanyanut ; Phaholphinyo, Sitthaa ; Purodakananda, Sumonmas ; Thanakulwarapas, Tipraporn ; Wutiwiwatchai, Chai
Author_Institution :
Human Language Technol. Lab. (HLT), Nat. Sci. & Technol. Dev. Agency (NSTDA), Pathumthani, Thailand
Abstract :
This is a non-technical paper describing how and why we organized BEST 2009, the first contest in the series of ldquobenchmark for enhancing the standard of Thai language processingrdquo, which is expected to help accelerate the progress of the natural language processing technology in Thailand by assembling 3 essential components: common standards, resources and researchers. The BEST 2009 : Thai word segmentation software contest is the first shared task on Thai NLP that exercised this assemblage and aimed to find the best algorithms that could correctly divide Thai non-segmented script into words according to the guidelines previously prepared by experts from several research institutes and universities. Thai word-segmented corpora of 5 million words have been developed as a training set, another 600 K as a test set. The evaluation procedure and protocol have been designed. The process and the results of the contest are reported.
Keywords :
natural language processing; text analysis; BEST 2009; NLP; Thai word segmentation software contest; natural language processing; text analysis; Acceleration; Assembly; Educational institutions; Guidelines; Natural language processing; Paper technology; Protocols; Software algorithms; Software standards; Testing;
Conference_Titel :
Natural Language Processing, 2009. SNLP '09. Eighth International Symposium on
Conference_Location :
Bangkok
Print_ISBN :
978-1-4244-4138-9
Electronic_ISBN :
978-1-4244-4139-6
DOI :
10.1109/SNLP.2009.5340941