DocumentCode :
2226728
Title :
A hybrid method for Web data extraction
Author :
Wang, Yu ; Zhou, Lizhu
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fYear :
2003
fDate :
13-17 Oct. 2003
Firstpage :
417
Lastpage :
420
Abstract :
Web data extraction refers to the technology that helps people find wanted information from the Web. We first classify existing data extraction algorithms into two classes: top-down and bottom-up, and then analyze their strengths and weaknesses in terms of extraction accuracy. On the basis of this analysis, we present a hybrid algorithm: bi-direction data extraction (BiDDE for short), which takes the full strengths of both top-down and bottom-up algorithms and yet avoid their weaknesses. The experimental results show that BiDDE has not only higher accuracy than top-down algorithm and bottom-up algorithm, but satisfactory performance.
Keywords :
Internet; hypermedia markup languages; information retrieval; tree searching; HTML documents; Web data extraction; bi-direction data extraction algorithm; bottom-up algorithms; information retrieval; top-down algorithms; Algorithm design and analysis; Bidirectional control; Computer science; Data mining; Databases; HTML; Internet; Particle separators; Web pages; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on
Print_ISBN :
0-7695-1932-6
Type :
conf
DOI :
10.1109/WI.2003.1241229
Filename :
1241229
Link To Document :
بازگشت