DocumentCode :
2323951
Title :
Finding english and translated Arabic documents similarities using GHSOM
Author :
Selamat, Ali ; Ismail, H.H.
Author_Institution :
Fac. of Comput. Sci. & Inf. Syst., Univ. Teknol. Malaysia, Skudai
fYear :
2008
fDate :
13-15 May 2008
Firstpage :
460
Lastpage :
465
Abstract :
The idea of finding similar news across Arabic and English sources is that to provide the audience with multiple views of the broadcasted news because reading the news from a single source may not always reflects on what happening around the world due different background, cultures and opinions of the readers and writers. To achieve this goal there are many techniques have been used to cluster the documents with similar themes. In this paper, we analyze the similarity of the views on the news written in the news translations form Arabic and English texts using self-organizing map (SOM). However, we have found there are some difficulties in SOM that affect its performance. In order to improve the problems of performance, we have used a growing hierarchical self-organizing map (GHSOM). The main advantage of such a mapping is the ease by which a user gains an idea regarding the structure of the data by analyzing the map. Thousands of news documents have been collected from Arabic and English news sources from the Web in order to train both algorithms. Form experiments, the results show that using GHSOM is better in terms of clustering documents with the same opinions.
Keywords :
broadcasting; natural language processing; self-organising feature maps; text analysis; Arabic language; English language; World Wide Web; broadcasted news; document clustering; document similarity; growing hierarchical self-organizing map; news document; news sources; news translation; Broadcasting; Clustering algorithms; Computer science; Cultural differences; Data analysis; Genetic algorithms; Humans; Information systems; Support vector machines; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Communication Engineering, 2008. ICCCE 2008. International Conference on
Conference_Location :
Kuala Lumpur
Print_ISBN :
978-1-4244-1691-2
Electronic_ISBN :
978-1-4244-1692-9
Type :
conf
DOI :
10.1109/ICCCE.2008.4580647
Filename :
4580647
Link To Document :
بازگشت