• DocumentCode
    1607121
  • Title

    CONVEX: Conjunct Verb extraction from parallel corpus: A hybrid approach

  • Author

    Choudhury, S.K. ; Kundu, Bijoy

  • Author_Institution
    Language Technol., Centre for Dev. of Adv. Comput., Kolkata, India
  • fYear
    2012
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Conjunct Verbs (CVs) are one of the special forms of Complex Predicates that behave as a single verbal unit but maintain a multiword structure. CVs play an important role in Natural Language Processing applications like Speech to Speech Translation, Machine Translation and lexical resource creation. But due to their distinct construction, detection and extraction of CVs is a challenging task. This paper presents a hybrid approach for mining CVs from parallel corpus combining rule-based and statistical approach. Though the proposed approach has been applied on Bangla-English parallel corpus to extract Bangla CVs, the methodology is equally applicable to other Indian languages of Indo-Aryan family, in presence of parts of speech tagger and sufficient amount of parallel corpus. Evaluation on Bangla-English parallel corpus of 50,000 sentences, the proposed approach yields an accuracy of 76% that can be improved by increasing the number of sentence pairs in the parallel corpus.
  • Keywords
    data mining; grammars; natural language processing; Bangla-English parallel corpus; CONVEX; complex predicates; conjunct verb extraction; conjunct verb mining; lexical resource creation; machine translation; multiword structure; natural language processing; rule-based approach; single verbal unit; speech to speech translation; statistical approach; Accuracy; Helium; Hidden Markov models; Periodic structures; Pragmatics; Speech; Speech processing; Conjunct Verbs; Expectation Maximization; Parallel Corpus; Word Alignment;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Human Computer Interaction (IHCI), 2012 4th International Conference on
  • Conference_Location
    Kharagpur
  • Print_ISBN
    978-1-4673-4367-1
  • Type

    conf

  • DOI
    10.1109/IHCI.2012.6481852
  • Filename
    6481852