مرکز منطقه ای اطلاع رساني علوم و فناوري - يافتن پارامترهاي بهينه براي الگوريتم خوشه‌بندي ADBSCAN با استفاده از الگوريتم ژنتيك

شماره ركورد كنفرانس :

5412

عنوان مقاله :

يافتن پارامترهاي بهينه براي الگوريتم خوشه‌بندي ADBSCAN با استفاده از الگوريتم ژنتيك

عنوان به زبان ديگر :

Finding Optimal Parameters for ADBSCAN Clustering Algorithm using Genetic Algorithm

پديدآورندگان :

انتظامي مطهره m.entezami.98@gmail.com كارشناسي ارشد علوم كامپيوتر، علوم كامپيوتر، دانشگاه ولي عصر، رفسنجان , شكيبا علي ali.shakiba@vru.ac.ir استاديار گروه علوم كامپيوتر، دانشگاه ولي عصر، رفسنجان

تعداد صفحه :

كليدواژه :

خوشه‌بندي مبتني بر چگالي , ADBSCAN , الگوريتم ژنتيك

سال انتشار :

1402

عنوان كنفرانس :

نهمين كنفرانس بين المللي وب پژوهي

زبان مدرك :

فارسي

چكيده فارسي :

خوشه‌بندي، فرآيندي است كه مجموعه‌اي از اشياء را به گروه‌هاي مجزا افراز مي‌كند كه هر افراز يك خوشه ناميده مي‌شود. در يك خوشه‌بندي، مطلوب است تا اعضاء هر خوشه از لحاظ ويژگي‌ها، به يكديگر شبيه باشند. همچنين، لازم است تا ميزان شباهت بين نمونه‌هايي كه در خوشه‌هاي متفاوت هستند، پايين باشد. به صورت كلي، الگوريتم‌هاي خوشه‌بندي از يكي از رويكردهاي افرازي، سلسله‌مراتبي، چگالي، مبتني بر مدل و يا تركيبي از آن‌ها استفاده مي‌كنند. الگوريتم ADBSCAN، الگوريتمي براي خوشه‌بندي دادگان و مبتني بر چگالي است. اين الگوريتم، يك روش جديد براي شناسايي نمونه‌هاي محلي با چگالي بالا با استفاده از خواص ذاتي گراف نزديكترين همسايگي را ارائه مي‌كند. در اين الگوريتم، از دو پارامتر k (تعداد نزديكترين همسايگان) و درصد نويز در مجموعه داده استفاده مي‌شود. اين دو پارامتر، تأثير به سزايي در نتيجه محاسبات و كيفيت خروجي دارند. بنابراين، لازم است تا اين دو مقدار در بهينه‌ترين حالت ممكن تنظيم شوند. جستجوي فراگير، يكي از راهكارهاي يافتن مقدار بهينه است. به منظور كاهش زمان جستجو، در اين مقاله از روش جستجوي ژنتيك براي يافتن مقادير بهينه‌ي اين پارامترها استفاده شده است. با به كارگيري روش پيشنهادي، به صورت متوسط، 46/11 درصد بهبود در معيار ARI حاصل شده است.

چكيده لاتين :

Clustering is the process of partitioning a set of objects into disjoint groups, each partition is called a cluster. Intuitively, it is desirable that the members in each cluster are very similar to each other in terms of their characteristics. As well, it is desirable to have a low degree of similarity between members in different clusters. In general, clustering algorithms can be categorized to follow either a partitioning, a hierarchical, a density, a model-based or any combination of these approaches. The ADBSCAN algorithm is a density-based clustering algorithm which presents a new method to identify high-density local instances considering the properties of the nearest neighbor graph. Two parameters are used in this algorithm, namely the parameter k representing the number of nearest neighbors, and the percentage of noise in the data set. These parameters have a significant effect on the quality of the output as well as the required time. Therefore, it is necessary to find optimal values for these parameters. Brute-force search is one of the naïve ways to this end. However, evolutionary-based algorithms such as genetic search methods can be used to make the search process easy and efficient. In this paper, we applied the genetic algorithm to get optimal values of the parameters. The proposed method led to an 11.46% improvement in the ARI criterion, on average.

كشور :

ايران

لينک به اين مدرک :

https://search.isc.ac/dl/search/defaultta.aspx?DTC=36&DC=358661