انتخاب اعضاي تركيب در خوشه‌بندي تركيبي با استفاده از رأي‌گيري

عنوان به زبان ديگر

Cluster ensemble selection using voting

پديد آورندگان

لطيفي پاكدهي، عليرضا دانشگاه تربيت دبير شهيد رجايي , دانشپور ،نگين دانشگاه تربيت دبير شهيد رجايي

تعداد صفحه

از صفحه

تا صفحه

كليدواژه

خوشه‌بندي تركيبي , شاخص‌هاي ارزيابي كيفيت , انتخاب اعضا

چكيده فارسي

خوشه‌بندي تركيبي، به تركيب نتايج حاصل از خوشه‌بندي‌هاي موجود مي‌پردازد. پژوهش‌هاي دهۀ اخير نشان مي‌دهد، چنانچه به جاي تركيب همۀ خوشه‌بندي‌ها، تنها دست‌هاي از آنها بر اساس كيفيت و تنوع انتخاب شوند، آنچه به‌عنوان خروجي خوشهبندي تركيبي حاصل مي‌شود، بسيار دقيق‌تر خواهد بود. اين مقاله به ارائه يك روش جديد براي انتخاب خوشه‌بندي‌ها بر اساس دو معيار كيفيت و تنوع مي‌پردازد. براي رسيدن به اين منظور ابتدا خوشه‌بندي‌هاي مختلفي با استفاده از الگوريتم k-means ايجاد مي‌شود كه در هر بار اجرا، مقدار k يك عدد تصادفي است. در ادامه خوشه‌بندي‌هايي كه به اين نحو توليد شدهاند، با استفاده از الگوريتم جديديكه براساس ميزان شباهت بين خوشه‌بندي‌هاي مختلف عمل مي‌كند، گروه‌بندي مي‌شوند تا آن‌دسته از خوشه‌بندي‌هايي كه به يكديگر شبيه‌اند در يك دسته قرار گيرند؛ سپس از هر دسته، با استفاده از يك روش مبتني بر رأي‌گيري، با كيفيت‌ترين عضو آن براي ايجاد خوشه‌بندي تركيبي انتخاب مي‌شود. در اين مقاله از سه تابع HPGA، CSPA و MCLA براي تركيب خوشه‌بندي‌ها استفاده شده است. در انتها براي آزمايش اين روش جديد از داده‌هاي واقعي موجود در پايگاه داده UCI استفاده شده است. نتايج نشان مي‌دهد كه روش جديد كارايي بيشتر و دقيق‌تري نسبت به روش‌هاي قبلي دارد.

چكيده لاتين

Clustering is the process of division of a dataset into subsets that are called clusters, so that objects within a cluster are similar to each other and different from objects of the other clusters. So far, a lot of algorithms in different approaches have been created for the clustering. An effective choice (can combine) two or more of these algorithms for solving the clustering problem. Ensemble clustering combines results of existing clusterings to achieve better performance and higher accuracy. Instead of combining all of existing clusterings, recent decade researchers show, if only a set of clusterings is selected based on quality and diversity, the result of ensemble clustering would be more accurate. This paper proposes a new method for ensemble clustering based on quality and diversity. For this purpose, firstly first we need a lot of different base clusterings to combine them. Different base clusterings are generated by k-means algorithm with random k in each execution. After the generation of base clusterings, they are put into different groups according to their similarities using a new grouping method. So that clusterings which are similar to each other are put together in one group. In this step, we use normalized mutual information (NMI) or adjusted rand index (ARI) for computing similarities and dissimilarities between the base clustering. Then from each group, a best qualified clustering is selected via a voting based method. In this method, Cluster-validity-indices were used to measure the quality of clustering. So that all members of the group are evaluated by the Cluster-validity-indices. In each group, clustering that optimizes the most number of Cluster-validity-indices is selected. Finally, consensus functions combine all selected clustering. Consensus function is an algorithm for combining existing clusterings to produce final clusters. In this paper, three consensus functions including CSPA, MCLA, and HGPA have used for combining clustering. To evaluate proposed method, real datasets from UCI repository have used. In experiment section, the proposed method is compared with the well-known and powerful existing methods. Experimental results demonstrate that proposed algorithm has better performance and higher accuracy than previous works.

سال انتشار

1397

عنوان نشريه

پردازش علائم و داده ها

فايل PDF

7500388

عنوان نشريه

پردازش علائم و داده ها

لينک به اين مدرک

https://search.isc.ac/dl/search/defaultta.aspx?DTC=8&DC=1017926