DocumentCode
1607121
Title
CONVEX: Conjunct Verb extraction from parallel corpus: A hybrid approach
Author
Choudhury, S.K. ; Kundu, Bijoy
Author_Institution
Language Technol., Centre for Dev. of Adv. Comput., Kolkata, India
fYear
2012
Firstpage
1
Lastpage
6
Abstract
Conjunct Verbs (CVs) are one of the special forms of Complex Predicates that behave as a single verbal unit but maintain a multiword structure. CVs play an important role in Natural Language Processing applications like Speech to Speech Translation, Machine Translation and lexical resource creation. But due to their distinct construction, detection and extraction of CVs is a challenging task. This paper presents a hybrid approach for mining CVs from parallel corpus combining rule-based and statistical approach. Though the proposed approach has been applied on Bangla-English parallel corpus to extract Bangla CVs, the methodology is equally applicable to other Indian languages of Indo-Aryan family, in presence of parts of speech tagger and sufficient amount of parallel corpus. Evaluation on Bangla-English parallel corpus of 50,000 sentences, the proposed approach yields an accuracy of 76% that can be improved by increasing the number of sentence pairs in the parallel corpus.
Keywords
data mining; grammars; natural language processing; Bangla-English parallel corpus; CONVEX; complex predicates; conjunct verb extraction; conjunct verb mining; lexical resource creation; machine translation; multiword structure; natural language processing; rule-based approach; single verbal unit; speech to speech translation; statistical approach; Accuracy; Helium; Hidden Markov models; Periodic structures; Pragmatics; Speech; Speech processing; Conjunct Verbs; Expectation Maximization; Parallel Corpus; Word Alignment;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Human Computer Interaction (IHCI), 2012 4th International Conference on
Conference_Location
Kharagpur
Print_ISBN
978-1-4673-4367-1
Type
conf
DOI
10.1109/IHCI.2012.6481852
Filename
6481852
Link To Document