DocumentCode :
3027374
Title :
Automatic Clustering Assessment through a Social Tagging System
Author :
Cunha, Eugenia ; Figueira, A.
Author_Institution :
CRACS, Univ. do Porto, Porto, Portugal
fYear :
2012
fDate :
5-7 Dec. 2012
Firstpage :
74
Lastpage :
81
Abstract :
Assessing the quality of the clustering process is fundamental in unsupervised clustering. In literature we can find three different clustering validity techniques: external criteria, internal criteria and relative criteria. In this paper, we focus on external criteria and present an algorithm that allows the implementation of external measures to assess clustering quality when the structure of the data set is unknown. To obtain an automatic partition of a data set and to reflect how documents must be grouped according to human intuition we use internal information present in data like descriptions provide by the users as tags and the distance between documents. The results show an evident correlation between manual and automatic classes indicating it is acceptable to use an automatic partition. In addition to presenting an alternative to finding the structure of the data set using meta-data such as tags, we also wanted to test the impact of their integration in the k-means++ algorithm and verify how it influences the quality of the formed clusters, suggesting a model of integration based on the occurrence of tags in document content. The experimental results indicate a positive impact when external measures are calculated, although there was no apparent correlation between the weight assigned to the tags and the quality of the obtained clusters.
Keywords :
document handling; meta data; pattern clustering; social networking (online); statistical analysis; automatic classes; automatic clustering assessment; automatic partitioning; clustering process quality assessment; document content grouping; external criteria; human intuition; internal criteria; internal information; k-means++ algorithm; manual classes; meta-data; relative criteria; social tagging system; tag weight assignment; unsupervised clustering validity techniques; Clustering algorithms; Communities; Humans; Manuals; Partitioning algorithms; Tagging; Vectors; cluster validity; clustering; effectiveness; quality assessment; tagging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Science and Engineering (CSE), 2012 IEEE 15th International Conference on
Conference_Location :
Nicosia
Print_ISBN :
978-1-4673-5165-2
Electronic_ISBN :
978-0-7695-4914-9
Type :
conf
DOI :
10.1109/ICCSE.2012.20
Filename :
6417277
Link To Document :
بازگشت