DocumentCode :
249856
Title :
Using Visual Clues Concept for Extracting Main Data from Deep Web Pages
Author :
Pusdekar, Satish J. ; Chhaware, Shaikh Phiroj
Author_Institution :
Dept. of Comput. Technol., Priyadarshani Coll. of Eng., Nagpur, India
fYear :
2014
fDate :
9-11 Jan. 2014
Firstpage :
190
Lastpage :
193
Abstract :
Extracting data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. A large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they are Web-page-programming-language-dependent. The contents on Web pages are always displayed regularly for users to browse. There is different ways for deep Web data extraction to overcome the limitations of previous works by utilizing some interesting common visual features on the deep Web pages. In this paper vision-based approach is Web page programming-language-independent approach is proposed. This approach utilizes the visual features of the web pages to extract data from deep web pages including data record extraction and data item extraction. Again we also propose a new evaluation measure revision to capture human effort needed to produce exact extraction of data. Our implementation on large set of web databases describes the proposed vision-based approach is highly effective for data extraction from deep web pages.
Keywords :
Internet; computer vision; data mining; Web page programming-language-independent approach; Web-page-programming-language-dependent; data extraction; data mining; deep Web pages; vision-based approach; visual clues concept; Data mining; Databases; Feature extraction; HTML; Noise; Visualization; Web pages; visual features for web pages; web data extraction; web data mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electronic Systems, Signal Processing and Computing Technologies (ICESC), 2014 International Conference on
Conference_Location :
Nagpur
Type :
conf
DOI :
10.1109/ICESC.2014.39
Filename :
6745371
Link To Document :
بازگشت