DocumentCode :
2094054
Title :
An Efficient R-G-B Algorithm for Web Crawler on Information Extraction
Author :
Sun, Huiye ; Li, Yaoguo ; Lin, Shan ; Zhu, Mingying
Author_Institution :
Coll. of Software, Nankai Univ., Tianjin, China
Volume :
1
fYear :
2008
fDate :
20-22 Dec. 2008
Firstpage :
640
Lastpage :
643
Abstract :
The emergence of search engines set off an unprecedented storm of information. In recent years, a new breakthrough - vertical search, emerged on the basis of the general search engines, compared with the general search engines, it must be conducted on the pre-analysis. A successful vertical search engine must be based on the accurate extraction of a wide variety of Web information. However, unstable shooting rate and low average accuracy are common problems for the current vertical search engines. After analyzing the existing approaches, we present a new "RGB algorithm" to meet the requirement of vertical search engines on the accuracy and the efficiency of information extraction, and report preliminary experimental results to prove that the algorithm can address the issue efficiently mentioned above.
Keywords :
information retrieval; search engines; R-G-B algorithm; Web crawler; information extraction; vertical search engine; Computer science; Crawlers; Data mining; Entropy; Hidden Markov models; Information processing; Search engines; Software algorithms; Sun; Uncertainty; Communication Entropy; Information Extraction; Information content; Web Crawler;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Computational Technology, 2008. ISCSCT '08. International Symposium on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-3746-7
Type :
conf
DOI :
10.1109/ISCSCT.2008.166
Filename :
4731509
Link To Document :
بازگشت