Title :
The Research and Application about the Information Extraction in Chinese Domain
Author :
Zhang, Suxiang ; Wen, Juan ; Qin, Ying ; Wang, Xiaojie ; Zhong, Yixin
Author_Institution :
Sch. of Inf. Eng., Beijing Univ. of Posts & Telecommun.
Abstract :
A specific prototype information service system was proposed by this paper, which can send interesting information to user with database search way from unstructured text. In order to achieve this goal, two fundamental issues were studied by using maximum entropy (ME) algorithm, which is named entity recognition and relation extraction. Our named entity recognition approach is distinguished from most of the previous approaches. Where, probabilistic feature functions are used instead of binary feature functions, it is one of the several differences between this model and the most of the previous ME based model. We also explore several new features in our model, which includes confidence functions, position of features etc. Like those in some previous works, we use sub-models to model Chinese person names, foreign names respectively, but we bring some new techniques in these sub-models. The experimental result is promising. Moreover, ME algorithm is the first time to be used to extract relations between entities from Chinese texts. Twelve features have been designed, which includes morphology, grammar and semantic feature. The experimental result is satisfied. Therefore, two research results were used into my information extraction system, the goal of information service came from unstructured text is achieved
Keywords :
feature extraction; maximum entropy methods; natural language processing; text analysis; Chinese domain; Chinese person names; entity recognition; foreign names; grammar; information extraction; maximum entropy algorithm; morphology; relation extraction; semantic feature; unstructured text; Data engineering; Data mining; Design engineering; Entropy; Morphology; Natural languages; Power engineering and energy; Prototypes; Spatial databases; Text recognition;
Conference_Titel :
Signal Processing, 2006 8th International Conference on
Conference_Location :
Beijing
Print_ISBN :
0-7803-9736-3
Electronic_ISBN :
0-7803-9736-3
DOI :
10.1109/ICOSP.2006.345822