DocumentCode :
3681043
Title :
Personal Information Extraction of the Teaching Staff Based on CRFs
Author :
Fang Dong;Junao Wang
Author_Institution :
Sch. of Comput., Wuhan Univ., Wuhan, China
fYear :
2015
Firstpage :
615
Lastpage :
617
Abstract :
As the attribute information of the profile stored in a web page is usually in the form of natural language, it is very difficult to use the HTML structure to extract the target information. In this paper Conditional Random Fields is adopted to extract the personal attribute information of the personal detail in web pages. Via segmentation system the HTML document could be divided into the sequence of words, and then to establish the appropriate template of characteristics and train the sample sequences, at last using the characteristics function model generated by CRFs to mark the test sequences and identify the information which need to be extracted. The experimental results show that annotation and reasoning function of the CRFs in the text sequence can be used to extract the specific attributes information in the personal home page very well.
Keywords :
"Data mining","Hidden Markov models","Feature extraction","Training","Speech","Web pages"
Publisher :
ieee
Conference_Titel :
Network and Information Systems for Computers (ICNISC), 2015 International Conference on
Type :
conf
DOI :
10.1109/ICNISC.2015.124
Filename :
7311964
Link To Document :
بازگشت