• DocumentCode
    3362954
  • Title

    A Parallel Comparator of Documents

  • Author

    Ksouri, Sonia Alouane ; Hidri, Minyar Sassi ; Barkaoui, Kamel

  • Author_Institution
    Ecole Nat. d´Ingonieurs de Tunis, Univ. Tunis El Manar, Tunis, Tunisia
  • fYear
    2013
  • fDate
    26-30 Aug. 2013
  • Firstpage
    48
  • Lastpage
    52
  • Abstract
    Documents, sentences and words clustering are well studied problems. Most existing algorithms cluster documents, sentences and words separately but not simultaneously. However, when analyzing large textual corpuses, the amount of data to be processed in a single machine is usually limited by the main memory available, and the increase of these data to be analyzed leads to increasing computational workload. In this paper we present a parallel fuzzy triadic similarity measure called PFT-Sim, to calculate fuzzy memberships in a context of document co-clustering based on a parallel programming architecture. It allows computing simultaneously fuzzy co-similarity matrices between documents/sentences and sentences/words. Each one is built on the basis of the others. The PFT-SIM model provides a parallel data analysis strategy and divides the similarity computing task into parallel sub-tasks to tackle efficiency and scalability problems.
  • Keywords
    data analysis; fuzzy set theory; matrix algebra; parallel programming; pattern clustering; text analysis; PFT-Sim; document co-clustering; fuzzy co-similarity matrices; fuzzy memberships; large textual corpuses; parallel comparator; parallel data analysis strategy; parallel fuzzy triadic similarity measure; parallel programming architecture; parallel sub-tasks; sentence clustering; similarity computing task; words clustering; Clustering algorithms; Complexity theory; Computational modeling; Computer architecture; Data models; Parallel processing; Text mining; Document co-clustering; Fuzzy sets; Parallel computing; Text Mining; Three-partite graph; multi-thread architecture;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications (DEXA), 2013 24th International Workshop on
  • Conference_Location
    Los Alamitos, CA
  • ISSN
    1529-4188
  • Print_ISBN
    978-0-7695-5070-1
  • Type

    conf

  • DOI
    10.1109/DEXA.2013.13
  • Filename
    6621344