DocumentCode :
710078
Title :
Semantic vector space model for reducing Arabic text dimensionality
Author :
Awajan, Arafat
Author_Institution :
Comput. Sci. Dept., Princess Sumaya Univ. for Technol., Amman, Jordan
fYear :
2015
fDate :
April 29 2015-May 1 2015
Firstpage :
129
Lastpage :
135
Abstract :
In this paper, we introduce an efficient method to represent Arabic texts in comparatively smaller sizes without losing significant information. The proposed method uses the linguistic features of the Arabic language, mainly its very productive morphology and its richness in synonyms, to reduce the dimension of the document vector and to improve its vector space model representation. We have incorporated semantic information from word thesauri like WordNet to create clusters of similar words extracted from the same root and regrouped along with their synonyms. Distributional similarity measures are applied on the word-context matrix associated with the document in order to identify similar words based on a text´s context. The experimental results have confirmed that the proposed method significantly reduces the size of text representation by about 20% compared with the stem-based vector space model and by about 40% compared with the traditional bag of words model.
Keywords :
natural language processing; text analysis; Arabic language; Arabic text dimensionality; WordNet; document vector; linguistic features; productive morphology; semantic information; semantic vector space model; synonyms; text context; text representation; vector space model representation; word context matrix; word thesauri; Context; Decision support systems; Morphology; Pragmatics; Semantics; Silicon; Thesauri; Arabic language processing; Semantic vector space model; text dimension reduction; word-context matrix;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Information and Communication Technology and its Applications (DICTAP), 2015 Fifth International Conference on
Conference_Location :
Beirut
Print_ISBN :
978-1-4799-4130-8
Type :
conf
DOI :
10.1109/DICTAP.2015.7113185
Filename :
7113185
Link To Document :
بازگشت