Title :
An efficient wrapper for Web data extraction and its application
Author :
Zhang, Suzhi ; Shi, Peizhong
Author_Institution :
Coll. of Comput. & Commun. Eng., Zhengzhou Univ. of Light Ind., Zhengzhou, China
Abstract :
Web Wrapper extracts the data from the given Web sources according to the corresponding extraction rules of them. Its´ design is a key technology for Web information extraction and integration. This paper describes the design and implementation of a kind of the Web wrapper which based on pre-defined schema. Then it validates the data extraction from the new books information Web pages of some publishing companies and analyses the extraction results with this kind of Web Wrapper. We find it can accurately extract the data from the Web source. So we can conclude that this kind of Web Wrapper which proposed in this paper is feasible, efficient and maintainable. It will be applied for Web data integration based on wrapper/mediator that we rely on to develop a Web application for book information integration and query system.
Keywords :
Internet; data handling; query processing; Web data extraction; Web data integration; Web information extraction; Web information integration; Web sources; Web wrapper; books information Web pages; query system; Application software; Books; Computer science; Computer science education; Data engineering; Data mining; Displays; Educational institutions; HTML; Web pages; Web data integration; Wrapper; extraction rule; information extraction; new book information;
Conference_Titel :
Computer Science & Education, 2009. ICCSE '09. 4th International Conference on
Conference_Location :
Nanning
Print_ISBN :
978-1-4244-3520-3
Electronic_ISBN :
978-1-4244-3521-0
DOI :
10.1109/ICCSE.2009.5228403