DocumentCode
2259119
Title
Design and Implementation of a Web Information Extraction System Based on R-G-B Algorithm
Author
Li, Yaoguo ; Sun, Huiye ; Lin, Shan ; Zhu, Mingying
Author_Institution
Collage of Software, Nankai Univ., Tianjin
Volume
1
fYear
2008
fDate
20-22 Dec. 2008
Firstpage
254
Lastpage
258
Abstract
With the enormous growth of the World Wide Web in recent years, the issue of how to extract information from web pages efficiently, accurately and flexibly has become an important challenge for web crawler designers. Different from many other approaches, "R-G-B" algorithm is a new algorithm, which can well meet the requirement of search engines on the accuracy and the efficiency of information extraction. In this paper, we describe the design and implementation of a web information extraction system module which is based on the algorithm. We present the architecture of the system and report preliminary experimental results to prove that the system can address the issue of robustness, flexibility and accuracy at a low cost.
Keywords
Web sites; information retrieval; search engines; R-G-B algorithm; Web information extraction system; Web pages; World Wide Web; search engines; Algorithm design and analysis; Costs; Crawlers; Data mining; Hidden Markov models; Robustness; Search engines; Web mining; Web pages; Web sites; Design and Implementation; Information Extraction; R-G-B Algorithm; Web Crawler;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Information Technology Application, 2008. IITA '08. Second International Symposium on
Conference_Location
Shanghai
Print_ISBN
978-0-7695-3497-8
Type
conf
DOI
10.1109/IITA.2008.388
Filename
4739574
Link To Document