DocumentCode :
1809595
Title :
Information Extraction incorporating Paragraph Feature and Hidden Markov Model
Author :
Na, Liu ; Mingyu, Lu ; Huanling, Tang
fYear :
2007
fDate :
18-21 Sept. 2007
Firstpage :
953
Lastpage :
956
Abstract :
With the data of Internet continuous growth, information extraction has become the foundational and effective means to handling the quantity of text. This paper puts forward a method of information extraction that incorporating paragraph feature and hidden Markov model. The method takes paragraph instead of words as research object, paragraph is text sequence saved from web pages after preprocessed. Every paragraph is converted into special tokens, these tokens are the observation symbols of hidden Markov model. The whole experiments are carried out on EBM Web pages set. The information extracted includes title, author, affiliation and journal etc. The experimental results show that this method can improve precision and recall in some degree.
Keywords :
feature extraction; hidden Markov models; feature extraction; hidden Markov model; information extraction; paragraph feature; Automata; Computer science; Data mining; Feature extraction; Filling; Hidden Markov models; IP networks; Parallel processing; Spatial databases; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Network and Parallel Computing Workshops, 2007. NPC Workshops. IFIP International Conference on
Conference_Location :
Liaoning
Print_ISBN :
978-0-7695-2943-1
Type :
conf
DOI :
10.1109/NPC.2007.109
Filename :
4351609
Link To Document :
بازگشت