DocumentCode
461677
Title
The Research and Application about the Information Extraction in Chinese Domain
Author
Zhang, Suxiang ; Wen, Juan ; Qin, Ying ; Wang, Xiaojie ; Zhong, Yixin
Author_Institution
Sch. of Inf. Eng., Beijing Univ. of Posts & Telecommun.
Volume
3
fYear
2006
fDate
16-20 2006
Abstract
A specific prototype information service system was proposed by this paper, which can send interesting information to user with database search way from unstructured text. In order to achieve this goal, two fundamental issues were studied by using maximum entropy (ME) algorithm, which is named entity recognition and relation extraction. Our named entity recognition approach is distinguished from most of the previous approaches. Where, probabilistic feature functions are used instead of binary feature functions, it is one of the several differences between this model and the most of the previous ME based model. We also explore several new features in our model, which includes confidence functions, position of features etc. Like those in some previous works, we use sub-models to model Chinese person names, foreign names respectively, but we bring some new techniques in these sub-models. The experimental result is promising. Moreover, ME algorithm is the first time to be used to extract relations between entities from Chinese texts. Twelve features have been designed, which includes morphology, grammar and semantic feature. The experimental result is satisfied. Therefore, two research results were used into my information extraction system, the goal of information service came from unstructured text is achieved
Keywords
feature extraction; maximum entropy methods; natural language processing; text analysis; Chinese domain; Chinese person names; entity recognition; foreign names; grammar; information extraction; maximum entropy algorithm; morphology; relation extraction; semantic feature; unstructured text; Data engineering; Data mining; Design engineering; Entropy; Morphology; Natural languages; Power engineering and energy; Prototypes; Spatial databases; Text recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing, 2006 8th International Conference on
Conference_Location
Beijing
Print_ISBN
0-7803-9736-3
Electronic_ISBN
0-7803-9736-3
Type
conf
DOI
10.1109/ICOSP.2006.345822
Filename
4129211
Link To Document