Title :
A Query Keywords Based Approach for Noisy Data Elimination
Author :
Wang, Ying-Kui ; Tan, Qian-Mao
Author_Institution :
Experimentation Teaching Center of Comput., Tianjin Univ., Tianjin, China
Abstract :
It´s important to eliminate noisy data for information extraction on the deep web. In this paper, we propose a new approach called ENDW(Eliminating Noisy Data in Web pages) based on query keywords and DOM tools to eliminate noisy data. Query keywords submitted to backend databases always appear in deep web pages. The boundary between useful data region and noisy data region is concerned with the position where the query keywords appear. Once we found this boundary, we could retain useful data region and eliminate noisy data region. Our experiments show that the approach is effective and stable.
Keywords :
Internet; data handling; database management systems; query processing; DOM tools; ENDW; backend databases; deep Web pages; information extraction; noisy data elimination; noisy data region; query keywords based approach; useful data region; Data mining; Databases; HTML; Noise measurement; Visualization; Web pages; deep web; noisy data elimination; web information extraction;
Conference_Titel :
Business Computing and Global Informatization (BCGIN), 2012 Second International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-4469-2
DOI :
10.1109/BCGIN.2012.138