DocumentCode :
3008911
Title :
Isarn Dharma word segmentation
Author :
Somsap, Sittichai ; Seresangtakul, Pusadee
Author_Institution :
Dept. of Comput. Sci., Khon Kaen Univ., Khon Kaen, Thailand
fYear :
2013
fDate :
25-28 Nov. 2013
Firstpage :
53
Lastpage :
57
Abstract :
This paper presents Isarn Dhama word segmentation based on the Isarn Dharma writing system and dictionary. In this study, input text is segmented into sequences of Isarn Dharma Character Clusters (IDCCs). Each IDCC represents a group of inseparable Isarn Dharma characters based on the Isarn Dharma writing system. The sequence of IDCCs will be considered as input in order to look for the most suitable segmentation word from the dictionary using the IDCC longest matching algorithm. Grouping rules were then used to group adjacent remaining IDCCs that do not match an Isarn word in the dictionary. In order to evaluate the efficiency of the proposed technique, Isarn literature, Jataka, legend and Buddha foretell were used as the testing data to test the proposed system; comparing with longest matching and a hybrid of the IDCC longest matching. The experiment results showed that the F-measures are 80.15%, 85.06% and 86.07% for the longest matching, the IDCC longest matching algorithm, and the proposed method, respectively.
Keywords :
dictionaries; natural language processing; pattern clustering; text analysis; Buddha foretell; F-measures; IDCC longest-matching algorithm; IDCC sequences; Isarn Dharma character cluster sequences; Isarn Dharma word segmentation; Isarn Dharma writing system; Isarn literature; Jataka; dictionary; efficiency evaluation; grouping rules; input text segmentation; inseparable Isarn Dharma characters; legend foretell; testing data; Accuracy; Clustering algorithms; Computer science; Dictionaries; Educational institutions; Information technology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Control, Automation and Information Sciences (ICCAIS), 2013 International Conference on
Conference_Location :
Nha Trang
Print_ISBN :
978-1-4799-0569-0
Type :
conf
DOI :
10.1109/ICCAIS.2013.6720529
Filename :
6720529
Link To Document :
بازگشت