Title :
Web data cleansing and preparation for ontology extraction using WordNet
Author :
Tan, Keng-Woei ; Han, Hyoil ; Elmasri, Ramez
Author_Institution :
Dept. of Comput. Sci. Eng., Texas Univ., Arlington, TX, USA
Abstract :
The explosive growth of data on the World Wide Web makes information management and knowledge discovery increasingly difficult. Applying database techniques to manage Web information can help in solving these problems. One difficulty encountered is that Web documents, unlike structured databases, contain unstructured and semi-structured data. Our hypothesis is that creating ontologies to describe the semantics of Web data is the key to bridging the gap between semi-structured data and structured databases, and hence to facilitating the application of database techniques. We extract an ontology (or conceptual schema) from a set of Web pages in a particular application domain automatically. The prototype we are constructing is called WebOntEx (Web Ontology Extraction). This paper describes the data preparation process and the semantic resolution process of the WebOntEx project to build a meta-database and a Web database
Keywords :
data mining; data preparation; database management systems; information resources; meta data; Web data semantics; Web database; Web documents; Web pages; WebOntEx; WordNet; World Wide Web; conceptual schema; data cleansing; data preparation; database techniques; information management; knowledge discovery; meta-database; ontology extraction; semantic resolution; semi-structured data; unstructured data; Computer science; Data mining; Databases; Explosives; HTML; Information management; Knowledge management; Ontologies; Prototypes; Web pages;
Conference_Titel :
Web Information Systems Engineering, 2000. Proceedings of the First International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-0577-5
DOI :
10.1109/WISE.2000.882844