DocumentCode :
3102777
Title :
Improving Chinese named entity recognition with lexical information
Author :
Fu, Guo-hong
Author_Institution :
Sch. of Comput. Sci. & Technol., Heilongjiang Univ., Harbin, China
Volume :
6
fYear :
2009
fDate :
12-15 July 2009
Firstpage :
3487
Lastpage :
3491
Abstract :
Named entity recognition (NER) plays a critical role in many natural language processing applications. Chinese NER is usually formalized as a chunking task. However, most formulations do not distinguish named entities from common words. This makes it difficult to explore lexical cues for NER. In this paper we propose a two-level IOB2 representation to merge lexical chunks and entity chunks, and develop a morpheme-based chunking system for Chinese NER. It works in three main steps: Given a plain Chinese sentence, a morpheme segmenter first segments it into a sequence of morphemes, then a lexical chunker is applied to tag each segmented morpheme with a proper lexical chunk tag indicating its position pattern in forming a word of a special type, and finally an entity chunker continues to label each morpheme with a hybrid chunk tag, containing the related entity boundary and category information if any. Our experiments on the IEER-99 and MET2 data demonstrate a significant enhancement of NER performance after using entity-internal part-of-speech information. We also show that lexical chunking quality is of importance for NER results.
Keywords :
natural language processing; text analysis; Chinese named entity recognition; entity chunks; lexical chunks; lexical information; morpheme-based chunking system; natural language processing; Application software; Computer science; Cybernetics; Data mining; Machine learning; Natural language processing; Testing; Text mining; Text recognition; White spaces; Entity chunking; Information extraction; Named entity recognition; lexical chunking;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2009 International Conference on
Conference_Location :
Baoding
Print_ISBN :
978-1-4244-3702-3
Electronic_ISBN :
978-1-4244-3703-0
Type :
conf
DOI :
10.1109/ICMLC.2009.5212793
Filename :
5212793
Link To Document :
بازگشت