Title :
Parallel treebank from word-aligned bilingual corpus. Language engineering for phrasal alignments
Author_Institution :
Dept. of Comput. Sci., Univ. of Craiova, Craiova, Romania
Abstract :
In this paper we describe a mechanism for parallel treebank generation between an intense studied language (i.e. English) and a less studied language, like Romanian. The Romanian constituents of the treebank are induced from the corresponding constituents of the English part taking into account the words alignments of the corpus. The proposed mechanism reuses and adjusts existing tools and algorithms for automatic Part-Of-Speech annotation and syntactic trees alignment.
Keywords :
natural language processing; Romanian; language engineering; parallel treebank; part-of-speech annotation; phrasal alignments; syntactic trees alignment; word-aligned bilingual corpus; Europe; Natural language processing; Pragmatics; Proposals; Syntactics; Tagging; Training;
Conference_Titel :
System Theory, Control, and Computing (ICSTCC), 2011 15th International Conference on
Conference_Location :
Sinaia
Print_ISBN :
978-1-4577-1173-2