DocumentCode :
3459967
Title :
A Web Page Segmentation Algorithm for Extracting Product Information
Author :
Wu, Changjun ; Zeng, Guosun ; Xu, Guorong
Author_Institution :
Dept. of Comput. Sci. & Technol., Tongji Univ., Shanghai
fYear :
2006
fDate :
20-23 Aug. 2006
Firstpage :
1374
Lastpage :
1379
Abstract :
Nowadays, as the rapid development of Internet, Web is becoming the most popular and also the largest resource for people to acquire information. At the same time, search engine plays an important role while retrieving inform.ation. Nevertheless, the smallest processing unit of search engine is the whole web pages, which contains plenty of noisy information. If the information can be extracted and used as the smallest processing unit, then it can place a positive effect on search engine´s precision; so was born the page segmentation algorithm. However, traditional algorithms cannot extract blocks in product level. Hence, a novel algorithm, basing on product features and DOM (document object mode), is proposed. Compared with those traditional algorithms, not only information consistence is greatly enhanced, but also complexity is decreased with this novel page segmentation algorithm.
Keywords :
Internet; Web sites; information retrieval; search engines; Internet; Web page segmentation; document object mode; information retrieval; product information extraction; search engine; Computer science; Data mining; Educational products; HTML; High performance computing; Information retrieval; Internet; Navigation; Search engines; Web pages; Information Retrieval; Page Segmentation; Product Block; Search Engine;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Acquisition, 2006 IEEE International Conference on
Conference_Location :
Shandong
Print_ISBN :
1-4244-0528-9
Electronic_ISBN :
1-4244-0529-7
Type :
conf
DOI :
10.1109/ICIA.2006.305954
Filename :
4097887
Link To Document :
بازگشت