• DocumentCode
    727795
  • Title

    Towards a hybrid NLG system for Data2Text in Portuguese

  • Author

    Pereira, Jose Casimiro ; Teixeira, Antonio ; Sousa Pinto, Joaquim

  • Author_Institution
    Inst. Politec. Tomar, Tomar, Portugal
  • fYear
    2015
  • fDate
    17-20 June 2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In many new interactions with machines, such as dialogue or output using voice, there is the need to convert information internal to a system into sentences, using Data2Text systems. Trying to avoid the limitations of template-based and classical NLG methods, systems based on automatic translation have been proposed in recent years. Despite providing sentences with the important variability needed for a better interaction, this doesn´t come without a cost. Contrary to template-based, these systems produce sentences with heterogeneous quality. In this paper we proposed to combine a translation based NLG system with a classifier module capable of providing information on the Intelligibility or Quality of the sentences. Sentences marked as unacceptable are replaced by template-based generated ones. This classifier module is the main focus of the paper and combines extraction of linguistic features with a classifier trained in a manually annotated corpus. Results suggest that our approach is valid as best results obtained have false positives below 8% and this metric can be even lower in practical applications, decreasing to around 3%, as the generation module produces low quality sentences at a rate lower than 30%.
  • Keywords
    natural language processing; text analysis; Data2Text systems; Portuguese; automatic translation; classical NLG methods; classifier module; hybrid NLG system; information internal; linguistic features; quality of the sentences; Feature extraction; Measurement; Natural languages; Pragmatics; Radio frequency; Support vector machines; Vegetation; Data2Text; Natural Language Generation (NLG); Portuguese; sentences quality evaluation; translation based NLG;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Systems and Technologies (CISTI), 2015 10th Iberian Conference on
  • Conference_Location
    Aveiro
  • Type

    conf

  • DOI
    10.1109/CISTI.2015.7170419
  • Filename
    7170419