Title :
Information Extraction incorporating Paragraph Feature and Hidden Markov Model
Author :
Na, Liu ; Mingyu, Lu ; Huanling, Tang
Abstract :
With the data of Internet continuous growth, information extraction has become the foundational and effective means to handling the quantity of text. This paper puts forward a method of information extraction that incorporating paragraph feature and hidden Markov model. The method takes paragraph instead of words as research object, paragraph is text sequence saved from web pages after preprocessed. Every paragraph is converted into special tokens, these tokens are the observation symbols of hidden Markov model. The whole experiments are carried out on EBM Web pages set. The information extracted includes title, author, affiliation and journal etc. The experimental results show that this method can improve precision and recall in some degree.
Keywords :
feature extraction; hidden Markov models; feature extraction; hidden Markov model; information extraction; paragraph feature; Automata; Computer science; Data mining; Feature extraction; Filling; Hidden Markov models; IP networks; Parallel processing; Spatial databases; Web pages;
Conference_Titel :
Network and Parallel Computing Workshops, 2007. NPC Workshops. IFIP International Conference on
Conference_Location :
Liaoning
Print_ISBN :
978-0-7695-2943-1
DOI :
10.1109/NPC.2007.109