Title :
A Parallel Comparator of Documents
Author :
Ksouri, Sonia Alouane ; Hidri, Minyar Sassi ; Barkaoui, Kamel
Author_Institution :
Ecole Nat. d´Ingonieurs de Tunis, Univ. Tunis El Manar, Tunis, Tunisia
Abstract :
Documents, sentences and words clustering are well studied problems. Most existing algorithms cluster documents, sentences and words separately but not simultaneously. However, when analyzing large textual corpuses, the amount of data to be processed in a single machine is usually limited by the main memory available, and the increase of these data to be analyzed leads to increasing computational workload. In this paper we present a parallel fuzzy triadic similarity measure called PFT-Sim, to calculate fuzzy memberships in a context of document co-clustering based on a parallel programming architecture. It allows computing simultaneously fuzzy co-similarity matrices between documents/sentences and sentences/words. Each one is built on the basis of the others. The PFT-SIM model provides a parallel data analysis strategy and divides the similarity computing task into parallel sub-tasks to tackle efficiency and scalability problems.
Keywords :
data analysis; fuzzy set theory; matrix algebra; parallel programming; pattern clustering; text analysis; PFT-Sim; document co-clustering; fuzzy co-similarity matrices; fuzzy memberships; large textual corpuses; parallel comparator; parallel data analysis strategy; parallel fuzzy triadic similarity measure; parallel programming architecture; parallel sub-tasks; sentence clustering; similarity computing task; words clustering; Clustering algorithms; Complexity theory; Computational modeling; Computer architecture; Data models; Parallel processing; Text mining; Document co-clustering; Fuzzy sets; Parallel computing; Text Mining; Three-partite graph; multi-thread architecture;
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2013 24th International Workshop on
Conference_Location :
Los Alamitos, CA
Print_ISBN :
978-0-7695-5070-1
DOI :
10.1109/DEXA.2013.13