Title :
Inducing Gazetteer for Chinese Named Entity Recognition Based on Local High-Frequent Strings
Author :
Pang, Wenbo ; Fan, Xiaozhong
Author_Institution :
Sch. of Comput. & Technol., Beijing Inst. of Technol., Beijing, China
Abstract :
Gazetteers, or entity dictionaries, are important for named entity recognition (NER). Although the dictionaries extracted automatically by the previous methods from a corpus, web or Wikipedia are very huge, they also misses some entities, especially the domain-specific entities. We present a novel method of automatic entity dictionary induction, which is able to construct a dictionary more specific to the processing text at a much lower computational cost than the previous methods. It extracts the local high-frequent strings in a document as candidate entities, and filters the invalid candidates with the accessor variety (AV) as our entity criterion. The experiments show that the obtained dictionary can effectively improve the performance of a high-precision baseline of NER.
Keywords :
natural language processing; Chinese named entity recognition; accessor variety; automatic entity dictionary induction; gazetteer; information extraction; local high-frequent strings; natural language processing; Computational efficiency; Conference management; Data mining; Dictionaries; Filters; Frequency; Information technology; Tagging; Testing; Wikipedia; information extraction; local high-frequent strings; named entity recognition; natural language processing;
Conference_Titel :
Future Information Technology and Management Engineering, 2009. FITME '09. Second International Conference on
Conference_Location :
Sanya
Print_ISBN :
978-1-4244-5339-9
DOI :
10.1109/FITME.2009.95