Title :
An Efficient R-G-B Algorithm for Web Crawler on Information Extraction
Author :
Sun, Huiye ; Li, Yaoguo ; Lin, Shan ; Zhu, Mingying
Author_Institution :
Coll. of Software, Nankai Univ., Tianjin, China
Abstract :
The emergence of search engines set off an unprecedented storm of information. In recent years, a new breakthrough - vertical search, emerged on the basis of the general search engines, compared with the general search engines, it must be conducted on the pre-analysis. A successful vertical search engine must be based on the accurate extraction of a wide variety of Web information. However, unstable shooting rate and low average accuracy are common problems for the current vertical search engines. After analyzing the existing approaches, we present a new "RGB algorithm" to meet the requirement of vertical search engines on the accuracy and the efficiency of information extraction, and report preliminary experimental results to prove that the algorithm can address the issue efficiently mentioned above.
Keywords :
information retrieval; search engines; R-G-B algorithm; Web crawler; information extraction; vertical search engine; Computer science; Crawlers; Data mining; Entropy; Hidden Markov models; Information processing; Search engines; Software algorithms; Sun; Uncertainty; Communication Entropy; Information Extraction; Information content; Web Crawler;
Conference_Titel :
Computer Science and Computational Technology, 2008. ISCSCT '08. International Symposium on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-3746-7
DOI :
10.1109/ISCSCT.2008.166