DocumentCode :
1786316
Title :
A Rendering-Based Method for Selecting the Main Data Region in Web Pages
Author :
Neiva Lopes Figueiredo, Leandro ; Almeida Ferreira, Anderson ; Tavares de Assis, Guilherme
Author_Institution :
Dept. de Comput., Univ. Fed. de Ouro Preto, Ouro Preto, Brazil
fYear :
2014
fDate :
22-24 Oct. 2014
Firstpage :
24
Lastpage :
32
Abstract :
Extracting data from web pages is an important task for several applications, such as comparison shopping and data mining. Much of that data is provided by search result pages, in which each result, called search result record, represents a record from a database. One of the most important steps for extracting such records is identifying, among different data regions from a page, one that contains the records to be extracted. An incorrect identification of this region may lead to an incorrect extraction of the search result records. In this paper, we propose a simple but efficient method that generates path expression to select the main data region from a given page, based on the rendering area information of its elements. The generated path expression may be used by wrappers for extracting the search result records and its data units, reducing its complexity and increasing its accuracy. Experimental results using web pages from several domains show that the method is highly effective.
Keywords :
Internet; information retrieval; rendering (computer graphics); Web pages; comparison shopping; data extraction; data mining; data region; data units; path expression; rendering area information; rendering-based method; search result record; Accuracy; Browsers; Data mining; HTML; Rendering (computer graphics); Visualization; Web pages; main data region; path expression; rendering information; visual information; wrapper;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Congress (LA-WEB), 2014 9th Latin American
Conference_Location :
Ouro Preto
Print_ISBN :
978-1-4799-6952-4
Type :
conf
DOI :
10.1109/LAWeb.2014.14
Filename :
7000168
Link To Document :
بازگشت