Title :
Hidden Markov Models and Text Classifiers for Information Extraction on Semi-Structured Texts
Author :
Barros, Flavia A. ; Silva, Eduardo F A ; Prudencio, Ricardo B. C. ; Filho, Valmir M. ; Nascimento, André C A
Author_Institution :
Center of Inf., Fed. Univ. of Pernambuco, Recife
Abstract :
Information extraction (IE) aims to extract from textual documents only the fragments which correspond to datafields required by the user. In this paper, we present new experiments evaluating a hybrid machine learning approach for IE that combines text classifiers and hidden Markov models (HMM). In this approach, a text classifier technique generates an initial output, which is refined by an HMM, taking into account dependences in the order of the data to be extracted. The proposal was evaluated to extract information from bibliographic references. Experiments performed on a corpus of 6000 references have shown an improvement in performance compared to benchmarking IE approaches adopted in previous work.
Keywords :
bibliographic systems; hidden Markov models; information retrieval; learning (artificial intelligence); pattern classification; text analysis; bibliographic references; hidden Markov models; hybrid machine learning approach; information extraction; semistructured texts; text classifiers; textual documents extraction; Data mining; Hidden Markov models; Hybrid intelligent systems; Informatics; Information retrieval; Machine learning; Proposals; Text categorization; Web search; Web sites; Hidden Markov Models; Information Extraction; Text Classifiers;
Conference_Titel :
Hybrid Intelligent Systems, 2008. HIS '08. Eighth International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-0-7695-3326-1
Electronic_ISBN :
978-0-7695-3326-1
DOI :
10.1109/HIS.2008.63