Title :
Metadata Extraction from Chinese Research Papers Based on Conditional Random Fields
Author :
Yu, Jiangde ; Fan, Xiaozhong
Author_Institution :
Beijing Inst. of Technol., Beijing
Abstract :
With the appearance of more and more research papers on the Internet, it becomes more and more important to accurately extract the metadata from paper header and citation of research papers. In this paper, a method based on conditional random fields (CRFs) is proposed for automatic extraction of metadata from Chinese research papers. The key of this algorithm is parameter estimation and feature selection. We employ L-BFGS algorithm for parameter estimation. We analyze three classes of features and perform feature induction. In the processing the method makes use of the format information of list separators and special-labels to segment text, and then combines CRFs for metadata extraction from papers. We compare the performance of the metadata extracting on English and Chinese datasets using CRFs, also compare the performance of the different model: CRFs and hidden Markov model (HMM) on Chinese datasets. Experimental results show that CRFs perform better than HMM.
Keywords :
feature extraction; hidden Markov models; meta data; text analysis; Chinese datasets; Chinese research papers; Internet; conditional random fields; feature induction; feature selection; hidden Markov model; metadata extraction; parameter estimation; Citation analysis; Computer science; Data mining; Educational institutions; Hidden Markov models; Internet; Parameter estimation; Particle separators; Performance analysis; World Wide Web;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on
Conference_Location :
Haikou
Print_ISBN :
978-0-7695-2874-8
DOI :
10.1109/FSKD.2007.394