DocumentCode
2306611
Title
Building a foundation of HPSG-based treebank on Bangla language
Author
Mahmud, Altaf ; Khan, Mumit
Author_Institution
CRBLP, BRAC Univ., Dhaka
fYear
2007
fDate
27-29 Dec. 2007
Firstpage
1
Lastpage
6
Abstract
Now a day, the importance of a large annotated corpus for NLP researchers is widely known. In this paper, we describe an initial phase of developing a linguistically annotated corpus for non-configurational dasiaBanglapsila language. Since, the formalism differs from those posited for configurational languages; several features have been added for constraint based parsing through HPSG-based formalism. We propose an outline of a semi-automated process by applying both case marking approach and some morphological analysis to constraint the parsing of a relatively free word order language for creating a linguistically rich, highly-lexicalized annotated corpus.
Keywords
context-free grammars; context-free languages; natural language processing; tree data structures; Bangla language; HPSG-based treebank; NLP; free word order language; head-driven phrase structure grammar formalism; lexicalized annotated corpus; natural language processing; Bidirectional control; Books; Data mining; Information retrieval; Natural language processing; Natural languages; Pattern matching; Speech analysis; Standards development; Stochastic processes; hpsg; non-configurational; parsing; treebank; treebanking;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer and information technology, 2007. iccit 2007. 10th international conference on
Conference_Location
Dhaka
Print_ISBN
978-1-4244-1550-2
Electronic_ISBN
978-1-4244-1551-9
Type
conf
DOI
10.1109/ICCITECHN.2007.4579375
Filename
4579375
Link To Document