Title :
CONVEX: Conjunct Verb extraction from parallel corpus: A hybrid approach
Author :
Choudhury, S.K. ; Kundu, Bijoy
Author_Institution :
Language Technol., Centre for Dev. of Adv. Comput., Kolkata, India
Abstract :
Conjunct Verbs (CVs) are one of the special forms of Complex Predicates that behave as a single verbal unit but maintain a multiword structure. CVs play an important role in Natural Language Processing applications like Speech to Speech Translation, Machine Translation and lexical resource creation. But due to their distinct construction, detection and extraction of CVs is a challenging task. This paper presents a hybrid approach for mining CVs from parallel corpus combining rule-based and statistical approach. Though the proposed approach has been applied on Bangla-English parallel corpus to extract Bangla CVs, the methodology is equally applicable to other Indian languages of Indo-Aryan family, in presence of parts of speech tagger and sufficient amount of parallel corpus. Evaluation on Bangla-English parallel corpus of 50,000 sentences, the proposed approach yields an accuracy of 76% that can be improved by increasing the number of sentence pairs in the parallel corpus.
Keywords :
data mining; grammars; natural language processing; Bangla-English parallel corpus; CONVEX; complex predicates; conjunct verb extraction; conjunct verb mining; lexical resource creation; machine translation; multiword structure; natural language processing; rule-based approach; single verbal unit; speech to speech translation; statistical approach; Accuracy; Helium; Hidden Markov models; Periodic structures; Pragmatics; Speech; Speech processing; Conjunct Verbs; Expectation Maximization; Parallel Corpus; Word Alignment;
Conference_Titel :
Intelligent Human Computer Interaction (IHCI), 2012 4th International Conference on
Conference_Location :
Kharagpur
Print_ISBN :
978-1-4673-4367-1
DOI :
10.1109/IHCI.2012.6481852