Title :
Reusing of information constructed in HTML documents: A conversion of HTML into OWL
Author :
Hwangbo, Hoon ; Lee, Hongchul
Author_Institution :
Dept. of Inf. & Manage. Eng., Korea Univ., Seoul
Abstract :
There have been efforts of making a knowledge based web, represented by Semantic Web. However, in this trend, HTML is not appropriate as a language for ontology and a structure of information. Due to numerous amounts of information in it, it seems rational to reuse those data in HTML. Previous studies are not enough to broadly convert HTML into OWL because they mainly focus on conversions of structured data (table tags), and they just give simple executions. In addition, GRDDL, a recommendation of W3C, needs an additional script for a conversion, and the output format of it is RDF which has some restrictions. This paper will offer three steps of conversions; (1) Extraction of information, (2) Acquiring triples, (3) Constructing ontology. There are two types of information; text-formed and non-text-formed information. In addition, there are two kinds of tags which include only text-formed information or which include both of text-formed and non-text-formed one. Depending on the type of tags, we classify tag categories and set rules for each of them. Using those rules, we can make triples, and finally we can construct ontology.
Keywords :
classification; hypermedia markup languages; knowledge representation languages; semantic Web; text analysis; HTML document; OWL; information extraction; information reuse; nontext-formed information; ontology construction; semantic Web; structured data conversion; tag classification; triple acquiring; Data mining; HTML; Image converters; Joining processes; OWL; Ontologies; Paper technology; Resource description framework; Semantic Web; XML; Analyzing system of English grammar; Conversion; Data extraction; HTML; OWL; Reusing information;
Conference_Titel :
Control, Automation and Systems, 2008. ICCAS 2008. International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-89-950038-9-3
Electronic_ISBN :
978-89-93215-01-4
DOI :
10.1109/ICCAS.2008.4694654