DocumentCode :
2966991
Title :
Semi-Automated Wrappers Using Rule Trees
Author :
Iasinschi, Adrian ; Cosulschi, Mirel
Author_Institution :
Fac. of Math. & Comput. Sci., Univ. of Craiova, Craiova, Romania
fYear :
2008
fDate :
26-29 Sept. 2008
Firstpage :
209
Lastpage :
215
Abstract :
In this paper we describe the concept of a semi-automated wrapper for extracting information from semi-structured pages, usually part of the e-commerce data intensive web sites. The process is based on creating extraction rules in a visual manner, using the DOM tree associated to a XHTML document, helping the user to make the right decisions. The extraction rules defined have a natural tree structure. Based on the model designed, the wrapper can then be used to navigate through the site and extract the relevant data.
Keywords :
Web sites; electronic commerce; information retrieval; tree data structures; DOM tree; XHTML document; data intensive Web sites; e-commerce; information extraction; rule trees; semiautomated wrapper; semistructured pages; tree structure; Competitive intelligence; Computer science; Data mining; HTML; Humans; Java; Mathematics; Scientific computing; Web pages; XML; rule; semi-automated wrapper; tree; web data extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Symbolic and Numeric Algorithms for Scientific Computing, 2008. SYNASC '08. 10th International Symposium on
Conference_Location :
Timisoara
Print_ISBN :
978-0-7695-3523-4
Type :
conf
DOI :
10.1109/SYNASC.2008.67
Filename :
5204813
Link To Document :
بازگشت