Title :
Linking book characters toward a corpus encoding relations between entities
Author :
Cristea, D. ; Ignat, Eugen
Author_Institution :
Dept. of Comput. Sci., Alexandru Ioan Cuza Univ. of Iasi, Iasi, Romania
Abstract :
What does a novel bring to a reader? What can it bring to a machine? Are there chances that a machine will decipher the messages a book expresses in free language? Part of the content of a text is encoded in relations between entities. In order to decode them, algorithms make use of learning techniques in which the training is guided by corpora that make explicit entities and relations. The creation of a gold corpus to be used in training and evaluation is therefore of a primary concern. This paper proposes annotation conventions and methodological prerequisites for the creation of a corpus that puts in evidence characters in a book and relations that are mentioned as holding between them, of the types: anaphoric, affective, kinship and social. The language under investigation is Romanian and the type of text used is fiction, but the proposed conventions are thought to be applicable to any language and type of text.
Keywords :
learning (artificial intelligence); literature; natural language processing; text analysis; Romanian language; affective type; anaphoric type; annotation convention; book character linking; corpus creation; corpus encoding relations; evidence characters; fiction text; kinship type; learning technique; methodological prerequisites; social type; Gold; Joining processes; Knowledge based systems; Semantics; Syntactics; Training; XML; XML; anaphoric relations; annotated corpora; annotation conventions; content analysis; entity linking; semantic relations; text analytics; text understanding;
Conference_Titel :
Speech Technology and Human - Computer Dialogue (SpeD), 2013 7th Conference on
Conference_Location :
Cluj-Napoca
DOI :
10.1109/SpeD.2013.6682658