DocumentCode :
3581287
Title :
Automatic Arabic text summarization using clustering and keyphrase extraction
Author :
Fejer, Hamzah Noori ; Omar, Nazlia
Author_Institution :
Center for AI Technol., Univ. Kebangsaan Malaysia (UKM), Bangi, Malaysia
fYear :
2014
Firstpage :
293
Lastpage :
298
Abstract :
As the number of electronic documents increases rapidly, the need for faster techniques to assess the relevance of these documents emerges. A summary is a concise representation of underlying text. A full understanding of the document is essential to form an ideal summary. However, achieving full understanding is either difficult or impossible for computers. Therefore, selecting important sentences from the original text and presenting these sentences as a summary present the most common techniques in automated text summarization. This paper propose a hybrid clustering method(partitioning and hierarchical) to group many Arabic documents into several clusters .Then keyphrase extraction module is applied to extract important Keyphrases from each cluster, which helps identify the most important sentences and find similar sentences based on several similarity algorithms. It applied to extract one sentence from a group of similar sentences while ignoring the other similar sentences (i.e., sentences that have a greater similarity than the predefined threshold). This model is designed for both single-and multi-document Arabic text summarization. The Recall-Oriented Understudy for Gisting Evaluation (ROGUE) matrix used for the evaluation. For the summarization dataset, Essex Arabic Summaries Corpus was used. It has many topic based articles with multiple human summaries. This model achieved an accuracy of 80 % for single-document and 62% for multi-document summarization.
Keywords :
natural language processing; pattern clustering; text analysis; word processing; Arabic document; Essex Arabic summary corpus; ROGUE matrix; automatic Arabic text summarization; electronic document; hybrid clustering method; keyphrase extraction module; multidocument Arabic text summarization; recall-oriented understudy-for-gisting evaluation matrix; similarity algorithm; single-document Arabic text summarization; Clustering algorithms; Clustering methods; Couplings; Feature extraction; Filtering; Information technology; Multimedia communication; Clustering; Keyphrase Extraction; ROUGE Matrix; Similarity; Text Summarization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology and Multimedia (ICIMU), 2014 International Conference on
Type :
conf
DOI :
10.1109/ICIMU.2014.7066647
Filename :
7066647
Link To Document :
بازگشت