DocumentCode :
2539453
Title :
A Query Keywords Based Approach for Noisy Data Elimination
Author :
Wang, Ying-Kui ; Tan, Qian-Mao
Author_Institution :
Experimentation Teaching Center of Comput., Tianjin Univ., Tianjin, China
fYear :
2012
fDate :
12-14 Oct. 2012
Firstpage :
508
Lastpage :
510
Abstract :
It´s important to eliminate noisy data for information extraction on the deep web. In this paper, we propose a new approach called ENDW(Eliminating Noisy Data in Web pages) based on query keywords and DOM tools to eliminate noisy data. Query keywords submitted to backend databases always appear in deep web pages. The boundary between useful data region and noisy data region is concerned with the position where the query keywords appear. Once we found this boundary, we could retain useful data region and eliminate noisy data region. Our experiments show that the approach is effective and stable.
Keywords :
Internet; data handling; database management systems; query processing; DOM tools; ENDW; backend databases; deep Web pages; information extraction; noisy data elimination; noisy data region; query keywords based approach; useful data region; Data mining; Databases; HTML; Noise measurement; Visualization; Web pages; deep web; noisy data elimination; web information extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Business Computing and Global Informatization (BCGIN), 2012 Second International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-4469-2
Type :
conf
DOI :
10.1109/BCGIN.2012.138
Filename :
6382579
Link To Document :
بازگشت