• DocumentCode
    3027374
  • Title

    Automatic Clustering Assessment through a Social Tagging System

  • Author

    Cunha, Eugenia ; Figueira, A.

  • Author_Institution
    CRACS, Univ. do Porto, Porto, Portugal
  • fYear
    2012
  • fDate
    5-7 Dec. 2012
  • Firstpage
    74
  • Lastpage
    81
  • Abstract
    Assessing the quality of the clustering process is fundamental in unsupervised clustering. In literature we can find three different clustering validity techniques: external criteria, internal criteria and relative criteria. In this paper, we focus on external criteria and present an algorithm that allows the implementation of external measures to assess clustering quality when the structure of the data set is unknown. To obtain an automatic partition of a data set and to reflect how documents must be grouped according to human intuition we use internal information present in data like descriptions provide by the users as tags and the distance between documents. The results show an evident correlation between manual and automatic classes indicating it is acceptable to use an automatic partition. In addition to presenting an alternative to finding the structure of the data set using meta-data such as tags, we also wanted to test the impact of their integration in the k-means++ algorithm and verify how it influences the quality of the formed clusters, suggesting a model of integration based on the occurrence of tags in document content. The experimental results indicate a positive impact when external measures are calculated, although there was no apparent correlation between the weight assigned to the tags and the quality of the obtained clusters.
  • Keywords
    document handling; meta data; pattern clustering; social networking (online); statistical analysis; automatic classes; automatic clustering assessment; automatic partitioning; clustering process quality assessment; document content grouping; external criteria; human intuition; internal criteria; internal information; k-means++ algorithm; manual classes; meta-data; relative criteria; social tagging system; tag weight assignment; unsupervised clustering validity techniques; Clustering algorithms; Communities; Humans; Manuals; Partitioning algorithms; Tagging; Vectors; cluster validity; clustering; effectiveness; quality assessment; tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Science and Engineering (CSE), 2012 IEEE 15th International Conference on
  • Conference_Location
    Nicosia
  • Print_ISBN
    978-1-4673-5165-2
  • Electronic_ISBN
    978-0-7695-4914-9
  • Type

    conf

  • DOI
    10.1109/ICCSE.2012.20
  • Filename
    6417277