Title :
Semi-Automated Wrappers Using Rule Trees
Author :
Iasinschi, Adrian ; Cosulschi, Mirel
Author_Institution :
Fac. of Math. & Comput. Sci., Univ. of Craiova, Craiova, Romania
Abstract :
In this paper we describe the concept of a semi-automated wrapper for extracting information from semi-structured pages, usually part of the e-commerce data intensive web sites. The process is based on creating extraction rules in a visual manner, using the DOM tree associated to a XHTML document, helping the user to make the right decisions. The extraction rules defined have a natural tree structure. Based on the model designed, the wrapper can then be used to navigate through the site and extract the relevant data.
Keywords :
Web sites; electronic commerce; information retrieval; tree data structures; DOM tree; XHTML document; data intensive Web sites; e-commerce; information extraction; rule trees; semiautomated wrapper; semistructured pages; tree structure; Competitive intelligence; Computer science; Data mining; HTML; Humans; Java; Mathematics; Scientific computing; Web pages; XML; rule; semi-automated wrapper; tree; web data extraction;
Conference_Titel :
Symbolic and Numeric Algorithms for Scientific Computing, 2008. SYNASC '08. 10th International Symposium on
Conference_Location :
Timisoara
Print_ISBN :
978-0-7695-3523-4
DOI :
10.1109/SYNASC.2008.67