DocumentCode :
2132653
Title :
Semantic annotation of semi-structured documents
Author :
Ranganathan, Girish R. ; Biletskiy, Yevgan ; Kaltchenko, Alexey
Author_Institution :
New Brunswick Univ., Fredericton, NB
fYear :
2008
fDate :
4-7 May 2008
Abstract :
The present paper proposes a novel method for semantic annotation of semi-structured documents using GATE (General Architecture for Text Engineering), one of the most famous and powerful annotation tools. The problem with GATE is that it is designed to annotate plain text and perform some natural language processing (NLP). Hence, when semi-structured documents are loaded, it ignores the markup or formatting information and works with text. But, depending on the document loading options (ldquomarkup awarerdquo or not) it either annotates the whole document including markup or takes just text destroying the original document structure. This behavior is unacceptable for annotating and saving annotation information into original documents which belong to popular formats (such as Microsoft Word, Excel, etc.). The proposed solution in the present paper allows saving annotations in original documents avoiding the destruction of the document contents and formatting information. The proposed method is essentially important for semantically enriching semi-structured documents (especially Microsoft Word and Excel) because it allows relating the information in these documents, without disturbing the original information, with ontological information, like ontology instances, rather than to the whole document.
Keywords :
natural language processing; ontologies (artificial intelligence); text analysis; natural language processing; ontology information extraction; semantic plain text annotation; semi structured document; text engineering; Decision support systems; Annotation; Computer Applications; GATE; Information Extraction; Ontology; Software Engineering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical and Computer Engineering, 2008. CCECE 2008. Canadian Conference on
Conference_Location :
Niagara Falls, ON
ISSN :
0840-7789
Print_ISBN :
978-1-4244-1642-4
Electronic_ISBN :
0840-7789
Type :
conf
DOI :
10.1109/CCECE.2008.4564670
Filename :
4564670
Link To Document :
بازگشت