• DocumentCode
    2306611
  • Title

    Building a foundation of HPSG-based treebank on Bangla language

  • Author

    Mahmud, Altaf ; Khan, Mumit

  • Author_Institution
    CRBLP, BRAC Univ., Dhaka
  • fYear
    2007
  • fDate
    27-29 Dec. 2007
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Now a day, the importance of a large annotated corpus for NLP researchers is widely known. In this paper, we describe an initial phase of developing a linguistically annotated corpus for non-configurational dasiaBanglapsila language. Since, the formalism differs from those posited for configurational languages; several features have been added for constraint based parsing through HPSG-based formalism. We propose an outline of a semi-automated process by applying both case marking approach and some morphological analysis to constraint the parsing of a relatively free word order language for creating a linguistically rich, highly-lexicalized annotated corpus.
  • Keywords
    context-free grammars; context-free languages; natural language processing; tree data structures; Bangla language; HPSG-based treebank; NLP; free word order language; head-driven phrase structure grammar formalism; lexicalized annotated corpus; natural language processing; Bidirectional control; Books; Data mining; Information retrieval; Natural language processing; Natural languages; Pattern matching; Speech analysis; Standards development; Stochastic processes; hpsg; non-configurational; parsing; treebank; treebanking;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and information technology, 2007. iccit 2007. 10th international conference on
  • Conference_Location
    Dhaka
  • Print_ISBN
    978-1-4244-1550-2
  • Electronic_ISBN
    978-1-4244-1551-9
  • Type

    conf

  • DOI
    10.1109/ICCITECHN.2007.4579375
  • Filename
    4579375