• DocumentCode
    163217
  • Title

    Text corpus for natural language story-telling sentence generation: A design and evaluation

  • Author

    Limpanadusadee, Worasa ; Punyabukkana, Proadpran ; Suchato, Atiwong ; Poobrasert, Onintra

  • Author_Institution
    Dept. of Comput. Eng., Chulalongkorn Univ., Bangkok, Thailand
  • fYear
    2014
  • fDate
    14-16 May 2014
  • Firstpage
    80
  • Lastpage
    85
  • Abstract
    Automatic generation of narrative sentences from unordered word sets is desirable in Augmentative and Alternative Communication (AAC) systems for children with certain learning disabilities (LD). Regardless of the complexity of the Natural Language Processing deployed in sentence generation procedures, the qualities of language models always affect the generation results. This work compared sentence generation accuracies obtained from a multi-tier N-gram-based procedure trained on BEST2010, a large publicly available text corpus, and a smaller but more specifically designed corpus in the task of Thai simple sentence generation. The latter, a new corpus called TELL-S, was created based on an analysis of the contents belonging to textbooks used in grade 1 and grade 2 for Thai language subjects according to the compulsory curriculum for Thai schools. The original procedure was also modified to incorporate additional constraints based on a story-telling guideline developed for LD children. Evaluated upon test sets of 195 sentences, each of which was composed of 3-6 words with a specific Part-Of-Speech combination, TELL-S was shown to provide better generalization and yielded higher accuracies than BEST2010 in all cases with unbiased word sets. The sentence generation accuracies were 100% and 70% for 3-word and 4-word sentences, respectively. The average accuracy was at 58.8% when longer sentences were also included.
  • Keywords
    computational linguistics; computer aided instruction; handicapped aids; natural language processing; AAC system; BEST2010; TELL-S; Thai simple sentence generation; augmentative and alternative communication; language model; learning disability; multitier N-gram; narrative sentences; natural language processing; part-of-speech combination; story-telling sentence generation; text corpus; Augmentative and Alternative Communication; Corpus Management; Learning Disabilities; N-Gram Model; Natural Language Generation; Statistical Natural Language Processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Software Engineering (JCSSE), 2014 11th International Joint Conference on
  • Conference_Location
    Chon Buri
  • Print_ISBN
    978-1-4799-5821-4
  • Type

    conf

  • DOI
    10.1109/JCSSE.2014.6841846
  • Filename
    6841846