• DocumentCode
    735888
  • Title

    JANA: An Arabic human-human dialogues corpus

  • Author

    Elmadany, AbdelRahim A. ; Abdou, Sherif M. ; Gheith, Mervat

  • Author_Institution
    Dept. of Comput. Sci., Cairo Univ., Cairo, Egypt
  • fYear
    2015
  • fDate
    9-11 July 2015
  • Firstpage
    347
  • Lastpage
    352
  • Abstract
    We present JANA, a multi-genre corpus of Arabic dialogues labeled for Arabic Dialogues Language Understanding (ADLU) at the utterance level. This paper describes progress in a development of the human-human dialogue corpus of Arabic spontaneous Spoken Dialogues (SD) and Instant Massages (IM). We collected dialogues from different genre call centers such as Banks, nights, and Mobile Network providers; these dialogues consist of transcribed phone calls and instant messages for inquiries regarding providing service from call centers. In addition, the annotation schema and manually turns segmentation are described. The collected data consist of approximately 3001 turns with average 6.7 words per turn, contains 4725 utterances with average 4.3 words per utterance, and 20311 words; and it will be made freely available to academic and nonprofit research.
  • Keywords
    natural language processing; ADLU; Arabic dialogues language understanding; Arabic human-human dialogues corpus; Arabic spontaneous spoken dialogues; IM; JANA; SD; annotation schema; human-human dialogue corpus; instant massages; multigenre corpus; transcribed phone calls; Decision support systems; Economic indicators; Annotated corpus; Arabic Dialgoues Corpus; Arabic Language Understanding; Dialogues Acts;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on
  • Conference_Location
    Kolkata
  • Type

    conf

  • DOI
    10.1109/ReTIS.2015.7232903
  • Filename
    7232903