DocumentCode :
542316
Title :
Automatic new word extraction method
Author :
Shi, Qin ; Shen, Li Qin ; Chai, Hai Xin
Author_Institution :
IBM China Research Laboratory, China
Volume :
1
fYear :
2002
fDate :
13-17 May 2002
Abstract :
New words are very difficult to be extracted automatically for those languages where there is no word boundary in written texts, such as Chinese, Japanese etc. In this paper, we present a Statistical method to extract new words from a large amount of corpus with no word boundary. Based on Generalized Suffix Tree (GST) data structure we define NWP (New Word Pattern) and SBP (Segmentation Boundary Pattern) to separate input strings into small pieces, and offer a practical and efficient algorithm to get the proper words from GST.
Keywords :
Manuals;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
Conference_Location :
Orlando, FL, USA
ISSN :
1520-6149
Print_ISBN :
0-7803-7402-9
Type :
conf
DOI :
10.1109/ICASSP.2002.5743876
Filename :
5743876
Link To Document :
بازگشت