DocumentCode :
2344042
Title :
Extracting Document Structure to Facilitate a Knowledge Base Creation for The UML Superstructure Specification
Author :
Nojoumian, Mehrdad ; Lethbridge, Timothy C.
Author_Institution :
Sch. of Inf. Technol. & Eng., Ottawa Univ., Ont.
fYear :
2007
fDate :
2-4 April 2007
Firstpage :
393
Lastpage :
400
Abstract :
The research presented in this paper aims at facilitating the creation of knowledge bases (KBs) for software specifications, of which the UML superstructure specification is our initial target. Our motivation is that such specifications are dense, repetitive and difficult to use. They are written primarily in semi-structured text, but the structure must be maintained manually as they are edited, resulting in inconsistency. End users cannot use them efficiently because of the duplications, numerous concepts connected only implicitly, and general complexity of the document. Our immediate objective is to generate a KB for the UML specification by extracting knowledge from as many sources as possible in the document such as document structure, embedded natural language, as well as implicit and explicit cross references. In this paper our focus is the first step: extraction of the document´s logical structure. Many key concepts of a document are expressed in this structure, which includes the headings of the chapters, sections, subsections, etc. By extracting such a structure in XML format, we can form a good infrastructure for the subsequent KB creation steps
Keywords :
Unified Modeling Language; document handling; formal specification; knowledge based systems; UML superstructure specification; XML format; document analysis; document conversion; document structure; embedded natural language; information extraction; knowledge acquisition; knowledge base creation; knowledge extraction; logical structure; software specification; Data mining; HTML; Humans; Information technology; Knowledge acquisition; Knowledge engineering; Natural languages; Text analysis; Unified modeling language; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology, 2007. ITNG '07. Fourth International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
0-7695-2776-0
Type :
conf
DOI :
10.1109/ITNG.2007.93
Filename :
4151716
Link To Document :
بازگشت