DocumentCode
2850753
Title
Hidden Markov Models and Text Classifiers for Information Extraction on Semi-Structured Texts
Author
Barros, Flavia A. ; Silva, Eduardo F A ; Prudencio, Ricardo B. C. ; Filho, Valmir M. ; Nascimento, André C A
Author_Institution
Center of Inf., Fed. Univ. of Pernambuco, Recife
fYear
2008
fDate
10-12 Sept. 2008
Firstpage
417
Lastpage
422
Abstract
Information extraction (IE) aims to extract from textual documents only the fragments which correspond to datafields required by the user. In this paper, we present new experiments evaluating a hybrid machine learning approach for IE that combines text classifiers and hidden Markov models (HMM). In this approach, a text classifier technique generates an initial output, which is refined by an HMM, taking into account dependences in the order of the data to be extracted. The proposal was evaluated to extract information from bibliographic references. Experiments performed on a corpus of 6000 references have shown an improvement in performance compared to benchmarking IE approaches adopted in previous work.
Keywords
bibliographic systems; hidden Markov models; information retrieval; learning (artificial intelligence); pattern classification; text analysis; bibliographic references; hidden Markov models; hybrid machine learning approach; information extraction; semistructured texts; text classifiers; textual documents extraction; Data mining; Hidden Markov models; Hybrid intelligent systems; Informatics; Information retrieval; Machine learning; Proposals; Text categorization; Web search; Web sites; Hidden Markov Models; Information Extraction; Text Classifiers;
fLanguage
English
Publisher
ieee
Conference_Titel
Hybrid Intelligent Systems, 2008. HIS '08. Eighth International Conference on
Conference_Location
Barcelona
Print_ISBN
978-0-7695-3326-1
Electronic_ISBN
978-0-7695-3326-1
Type
conf
DOI
10.1109/HIS.2008.63
Filename
4626665
Link To Document