DocumentCode
3362954
Title
A Parallel Comparator of Documents
Author
Ksouri, Sonia Alouane ; Hidri, Minyar Sassi ; Barkaoui, Kamel
Author_Institution
Ecole Nat. d´Ingonieurs de Tunis, Univ. Tunis El Manar, Tunis, Tunisia
fYear
2013
fDate
26-30 Aug. 2013
Firstpage
48
Lastpage
52
Abstract
Documents, sentences and words clustering are well studied problems. Most existing algorithms cluster documents, sentences and words separately but not simultaneously. However, when analyzing large textual corpuses, the amount of data to be processed in a single machine is usually limited by the main memory available, and the increase of these data to be analyzed leads to increasing computational workload. In this paper we present a parallel fuzzy triadic similarity measure called PFT-Sim, to calculate fuzzy memberships in a context of document co-clustering based on a parallel programming architecture. It allows computing simultaneously fuzzy co-similarity matrices between documents/sentences and sentences/words. Each one is built on the basis of the others. The PFT-SIM model provides a parallel data analysis strategy and divides the similarity computing task into parallel sub-tasks to tackle efficiency and scalability problems.
Keywords
data analysis; fuzzy set theory; matrix algebra; parallel programming; pattern clustering; text analysis; PFT-Sim; document co-clustering; fuzzy co-similarity matrices; fuzzy memberships; large textual corpuses; parallel comparator; parallel data analysis strategy; parallel fuzzy triadic similarity measure; parallel programming architecture; parallel sub-tasks; sentence clustering; similarity computing task; words clustering; Clustering algorithms; Complexity theory; Computational modeling; Computer architecture; Data models; Parallel processing; Text mining; Document co-clustering; Fuzzy sets; Parallel computing; Text Mining; Three-partite graph; multi-thread architecture;
fLanguage
English
Publisher
ieee
Conference_Titel
Database and Expert Systems Applications (DEXA), 2013 24th International Workshop on
Conference_Location
Los Alamitos, CA
ISSN
1529-4188
Print_ISBN
978-0-7695-5070-1
Type
conf
DOI
10.1109/DEXA.2013.13
Filename
6621344
Link To Document