Title :
VEDD- a visual wrapper for extraction of data using DOM tree
Author :
Tripathy, A.K. ; Joshi, Nilakshi ; Thomas, Steffy ; Shetty, Shweta ; Thomas, Namitha
Author_Institution :
Dept. of Comput. Eng., Don Bosco Inst. of Technol., Mumbai, India
Abstract :
The World Wide Web plays an important role while searching for information in the data network. Users are constantly exposed to an ever-growing flood of information. A wrapper is an application which helps in searching for Search Results Records (SSR) from multiple search engines. This helps in making the search more efficient and reliable. VEDD wrapper extracts the relevant SRRs from three search engines by filtering out the noisy and redundant records. Finally the unique set of records is displayed in a common VEDD search result page. The extraction is performed using the concepts of Document Object Model (DOM) tree. The paper presents a series of data filters to detect and remove irrelevant data from the web page. The data filters will also be used to further improve the similarity check of data records. Also, visual cues from the underlying browser rendering engine is made use to locate and extract the relevant data region from the deep web by the keyword matching technique.
Keywords :
Internet; Web sites; information filtering; information filters; search engines; string matching; DOM tree; VEDD wrapper; Web page; World Wide Web; browser rendering engine; data extraction; data filters; data network; data record similarity check; deep Web; document object model; irrelevant data detection; irrelevant data removal; keyword matching technique; search engines; search results records; visual wrapper; Data mining; Filtering; Flowcharts; HTML; Search engines; Visualization; Web pages; Content Keyword; DOM tree; Information extraction; Search engine results page;
Conference_Titel :
Communication, Information & Computing Technology (ICCICT), 2012 International Conference on
Conference_Location :
Mumbai
Print_ISBN :
978-1-4577-2077-2
DOI :
10.1109/ICCICT.2012.6398114