Title :
TALAA-ASC: A sentence compression corpus for Arabic
Author :
Riadh Belkebir;Ahmed Guessoum
Author_Institution :
Natural Language Processing and Machine Learning Research Group, Laboratory of Research in Artificial Intelligence, Computer Science Department, Universit? des Sciences et de la Technologie Houari Boumediene (USTHB), Algiers, Algeria
Abstract :
A lot of work has been performed for many languages other than Arabic in sentence compression. Unfortunately, there is a lack of effort devoted to Arabic sentence compression. One of the reasons behind the lack of work in Arabic sentence compression is the absence of Arabic sentence compression corpora. In order to build and evaluate sentence compression systems, parallel corpora consisting of source sentences and their corresponding compressions are needed. In this paper, we present TALAA-ASC, the first Arabic sentence compression corpus. We present the methodology we followed in order to construct the corpus. We also give the different statistics and analyses that we have performed on this corpus.
Keywords :
"XML","Buildings","Guidelines","Natural language processing","Supervised learning","Integer linear programming","Noise measurement"
Conference_Titel :
Computer Systems and Applications (AICCSA), 2015 IEEE/ACS 12th International Conference of
Electronic_ISBN :
2161-5330
DOI :
10.1109/AICCSA.2015.7507228