DocumentCode
2708008
Title
Aligning Data Records Using WordNet
Author
Hong, Jer Lang ; Siew, Eu-Gene ; Egerton, Simon
Author_Institution
Sch. of IT, Monash Univ., Monash, VIC, Australia
fYear
2010
fDate
7-10 May 2010
Firstpage
56
Lastpage
60
Abstract
Visual wrappers use visual information in addition to the DOM Tree properties in the extraction of data records. However, a closer look indicates that visual information can also be used to align data records into tabular form. In this paper, we propose a data alignment algorithm to align data records using DOM Tree properties and visual cue of data records. Our data alignment algorithm uses a regular expression rule and incorporates visual cue such as relative position and size of data items to provide options for the alignment of iterative and disjunctive data items. Results show that our wrapper performs better than existing state of the art wrappers.
Keywords
Web sites; natural language processing; ontologies (artificial intelligence); tree data structures; word processing; WordNet; data records alignment; disjunctive data item; iterative data item; lexical database; ontological technique; Data mining; Databases; HTML; Information retrieval; Metasearch; Ontologies; Positron emission tomography; Search engines; Tree data structures; Web pages; Automatic Wrapper; Ontology domain; Search Engines Result Pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Research and Development, 2010 Second International Conference on
Conference_Location
Kuala Lumpur
Print_ISBN
978-0-7695-4043-6
Type
conf
DOI
10.1109/ICCRD.2010.79
Filename
5489407
Link To Document