مرکز منطقه ای اطلاع رساني علوم و فناوري - Personal Information Extraction of the Teaching Staff Based on CRFs

DocumentCode :

3681043

Title :

Personal Information Extraction of the Teaching Staff Based on CRFs

Author :

Fang Dong;Junao Wang

Author_Institution :

Sch. of Comput., Wuhan Univ., Wuhan, China

fYear :

2015

Firstpage :

615

Lastpage :

617

Abstract :

As the attribute information of the profile stored in a web page is usually in the form of natural language, it is very difficult to use the HTML structure to extract the target information. In this paper Conditional Random Fields is adopted to extract the personal attribute information of the personal detail in web pages. Via segmentation system the HTML document could be divided into the sequence of words, and then to establish the appropriate template of characteristics and train the sample sequences, at last using the characteristics function model generated by CRFs to mark the test sequences and identify the information which need to be extracted. The experimental results show that annotation and reasoning function of the CRFs in the text sequence can be used to extract the specific attributes information in the personal home page very well.

Keywords :

"Data mining","Hidden Markov models","Feature extraction","Training","Speech","Web pages"

Publisher :

ieee

Conference_Titel :

Network and Information Systems for Computers (ICNISC), 2015 International Conference on

Type :

conf

DOI :

10.1109/ICNISC.2015.124

Filename :

7311964

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3681043