DocumentCode :
3148354
Title :
Algorithm Research for the Noise of Information Extraction Based Vision and DOM Tree
Author :
Sun, Tieli ; Li, Zhiying ; Liu, Yanji ; Liu, Zhenghong
Author_Institution :
Sch. of Comput. Sci., Northeast Normal Univ., Changchun, China
fYear :
2009
fDate :
15-16 May 2009
Firstpage :
81
Lastpage :
84
Abstract :
Information extraction from Web sites is nowadays a relevant problem, usually performed by software modules called wrappers. Introduced the relevant information extraction technology. A combination of HTML pages to extract information of the theme and extract the contents. First of all, to remove noise combination of visual block, the vision-based DOM tree denoising methods to improve the efficiency of extraction.
Keywords :
Web sites; hypermedia markup languages; information retrieval; trees (mathematics); HTML pages; Web sites; information extraction; vision-based DOM tree denoising methods; wrappers software modules; Computer science; Computer science education; Data mining; Databases; HTML; Software algorithms; Software performance; Sun; Ubiquitous computing; Web pages; DOM tree; information extraction; match technology; wrapper;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Ubiquitous Computing and Education, 2009 International Symposium on
Conference_Location :
Chengdu
Print_ISBN :
978-0-7695-3619-4
Type :
conf
DOI :
10.1109/IUCE.2009.47
Filename :
5223346
Link To Document :
بازگشت