Title :
A Corpus-based evaluation of lexical components of a domain-specific text to Knowledge Mapping prototype
Author :
Shams, Rushdi ; Elsayed, Adel
Author_Institution :
Dept. of Comput. Sci. & Eng., Khulna Univ. of Eng. & Technol. (KUET), Khulna
Abstract :
The aim of this paper is to evaluate the lexical components of a text to knowledge mapping (TKM) prototype. The prototype is domain-specific, the purpose of which is to map instructional text onto a knowledge domain. The context of the knowledge domain of the prototype is physics, specifically DC electrical circuits. During development, the prototype has been tested with a limited data set from the domain. The prototype now reached a stage where it needs to be evaluated with a representative linguistic data set called corpus. A corpus is a collection of text drawn from typical sources which can be used as a test data set to evaluate NLP systems. As there is no available corpus for the domain, we developed a representative corpus and annotated it with linguistic information. The evaluation of the prototype considers one of its two main components-lexical knowledge base. With the corpus, the evaluation enriches the lexical knowledge resources like vocabulary and grammar structure. This leads the prototype to parse a reasonable amount of sentences in the corpus.
Keywords :
grammars; linguistics; natural language processing; text analysis; vocabulary; NLP system; corpus-based evaluation; domain-specific text; grammar; instructional text; knowledge domain; lexical component; lexical knowledge resources; linguistic data set; text to knowledge mapping; vocabulary; Circuit testing; Computer science; Design engineering; Information technology; Knowledge engineering; Physics; Prototypes; System testing; Tagging; Vocabulary; Corpus; Knowledge Mapping; Lexicon; Morphology; NLP Systems;
Conference_Titel :
Computer and Information Technology, 2008. ICCIT 2008. 11th International Conference on
Conference_Location :
Khulna
Print_ISBN :
978-1-4244-2135-0
Electronic_ISBN :
978-1-4244-2136-7
DOI :
10.1109/ICCITECHN.2008.4803005