مرکز منطقه ای اطلاع رساني علوم و فناوري - Web Data Extraction Based on Simple Tree Matching

DocumentCode :

2053578

Title :

Web Data Extraction Based on Simple Tree Matching

Author :

Wang, Hua ; Zhang, Yang

Author_Institution :

Coll. of Inf. Eng., Northwest A&F Univ., Yangling, China

Volume :

fYear :

2010

fDate :

14-15 Aug. 2010

Firstpage :

Lastpage :

Abstract :

The information on the Internet has been grown exponentially, the Internet users are overwhelmed by these information. How to automatically extract useful information from the relevant pages, so as to provide a convenient and rapid information query platform for the users, is an important issue. In this paper, based on simple tree matching algorithm, we present a Web data extraction method based on simple tree matching by analyzing the structure and content of Web documents. Experimental results on Web data from several famous websites show that the proposed Web data extraction method can effectively extract data records from similar Web pages, with extraction precision reached about 90%, and can meet the requirement of extracting accurate data in real-life applications.

Keywords :

Web services; data mining; query processing; trees (mathematics); Internet; Web data extraction method; Web documents; Web pages; Web sites; information query platform; simple tree matching algorithm; Artificial intelligence; Books; Data mining; Feature extraction; HTML; Heuristic algorithms; Web pages; DOM; Information Extraction; Simple tree matching; XPath;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Engineering (ICIE), 2010 WASE International Conference on

Conference_Location :

Beidaihe, Hebei

Print_ISBN :

978-1-4244-7506-3

Electronic_ISBN :

978-1-4244-7507-0

Type :

conf

DOI :

10.1109/ICIE.2010.100

Filename :

5571205

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2053578