Title :
Segmentation and Tagging of Oracle Inscriptions Based on Lucene and Dictionary
Author :
Kai Jin-yu ; Li Na ; Liu Yong-ge
Author_Institution :
Sch. of Comput. & Inf. Eng., Anyang Normal Univ. Oracle Inf. Process. Key Lab., Anyang, China
Abstract :
Segmentation and Part of Speech tagging of Oracle inscriptions are the premise and foundation for establishment of Oracle Corpus and computer-aided Oracle textual research and explication. As for segmentation of Oracle inscriptions, this paper proposes a positive match cut algorithm, which adopts language analyzer based on Lucene and supplemented with Oracle Dictionary. Then the segmented words by the algorithm are tagged. Experiments show that the correct rate is more than 90%.
Keywords :
dictionaries; grammars; programming languages; text analysis; Lucene; Oracle corpus; Oracle inscriptions segmentation; Oracle inscriptions tagging; computer-aided Oracle textual research; dictionary; language analyzer; part of speech tagging; positive match cut algorithm; segmented words; Accuracy; Computers; Dictionaries; Indexes; Laboratories; Speech; Tagging; Lucene; Oracle inscriptions; segmentation;
Conference_Titel :
Multimedia Information Networking and Security (MINES), 2010 International Conference on
Conference_Location :
Nanjing, Jiangsu
Print_ISBN :
978-1-4244-8626-7
Electronic_ISBN :
978-0-7695-4258-4
DOI :
10.1109/MINES.2010.133