مرکز منطقه ای اطلاع رساني علوم و فناوري - ارائه ي يك روش خوشه بندي توافقي سري هاي زماني بر اساس روش Fuzzy C-Means و الگوريتم انبوه ذرات

شماره ركورد :

1234842

عنوان مقاله :

ارائه ي يك روش خوشه بندي توافقي سري هاي زماني بر اساس روش Fuzzy C-Means و الگوريتم انبوه ذرات

عنوان به زبان ديگر :

A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach

پديد آورندگان :

ايزكيان، زاهده دانشگاه صنعتي خواجه نصيرالدين طوسي - دانشكده مهندسي نقشه برداري , سعدي مسگري، محمد دانشگاه صنعتي خواجه نصيرالدين طوسي - دانشكده مهندسي نقشه برداري

تعداد صفحه :

از صفحه :

از صفحه (ادامه) :

تا صفحه :

تا صفحه(ادامه) :

كليدواژه :

داده كاوي , خوشه بندي , تابع فاصله , سري زماني , اطلاعات مشترك نرمال شده , الگوريتم انبوه ذرات

چكيده فارسي :

در سال هاي اخير با پيشرفت فناوري هاي جمع آوري اطلاعات و فراهم شدن حجم عظيمي از داده هاي پيچيده همچون سريه اي زماني نياز به روش هايي مناسب به منظور تجزيه و تحليل اين نوع داده بيش از پيش احساس مي شود. از ميان روش هاي مختلف داده كاوي موجود تكنيك خوشه بندي داده ها با هدف ساده سازي مجموعه داده هاي بزرگ و استخراج اطلاعات مفيد توجه بسياري از محققين علوم كامپيوتر را به خود جلب كرده است. مسئله ي انتخاب تابع فاصله يكي از مهمترين چالش هايي است كه پيش از آغاز فرآيند خوشه بندي سريه اي زماني مورد توجه قرار مي گيرد. انتخاب تابع فاصله ي مناسب يك مجموعه داده به شناخت ماهيت داده پيش از انجام عمليات خوشه بندي وابسته مي باشد و از اين رو امري پيچيده و زمانبر مي باشد. از سويي ديگر تاكنون توابع فاصله ي مختلفي با ويژگي ها و نقاط قوت متفاوت به منظور اندازه گيري ميزان تفاوت/شباهت ميان سريه اي زماني پيشنهاد داده شده است. چگونگي ارائه ي يك روش خوشه بندي با قابليت بهره جستن از ويژگي هاي توابع فاصله ي مختلف به طور همزمان و بدون نياز به شناخت ماهيت داده ها پيش از آغاز فرآيند خوشه بندي، چالش اصلي اين تحقيق مي باشد. به منظور حل اين مسئله در اين تحقيق يك روش خوشه بندي با تركيب روش خوشه بندي Fuzzy C-Means (FCM) و الگوريتم شناخته شدهي مبتني بر هوش جمعي انبوه ذرات (PSO) با هدف استفاده از توابع فاصله ي مختلف با وزن هاي متفاوت در حين فرآيند خوشه بندي پيشنهاد داده شد. انتخاب تابع هدف در اين مطالعه به گونه اي بوده است كه نتيجه ي حاصل از خوشه بندي بيشترين اشتراك را با نتايج خوشه بندي حاصل از توابع فاصله ي مختلف داشته باشد. به عبارت ديگر روش خوشه بندي ارائه شده در اين تحقيق يك روش خوشه بندي توافقي مي باشد كه نتيجه حاصل توافق ميان توابع فاصلهي مختلف مي باشد. روش پيشنهادي ارائه شده در اين تحقيق با در نظر گرفتن سه تابع فاصله ي مختلف بر روي هفت سري مجموعه داده ي شناخته شده از سريه اي زماني پياده سازي شد و با پنج روش ديگر مقايسه گرديد نتايج حاصل از اين مقايسه نشان داد روش ارائه شده در اين تحقيق در بيشتر از 85 درصد موارد بهتر از ساير روش ها عمل كرده است.

چكيده لاتين :

In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences, and data mining techniques provide useful solutions to solve this problem. Nowadays, clustering technique as the most widely used function of data mining, has attracted the attention of many researchers in various sciences. Due to different applications, the problem of clustering time series data has become highly popular and many approaches have been presented in this field. An efficient clustering method groups data in such a way that the objects in the same cluster are more similar to each other than to objects in different clusters. In order to compute the difference/similarity between time series data in clustering process, a similarity measure or distance function is used. Therefore, choosing an appropriate distance function is one of the most important challenges that should be considered before starting the clustering process. So far, various distance functions have been proposed to measure the difference/similarity between time series and each of them have its own strengths and weaknesses. Since choosing a suitable distance function to cluster a specific data set is a complicated process, in this study, we proposed a clustering method based on combination of the well-known Fuzzy C-Means (FCM) method and the Particle Swarm Optimization with the ability of using different distance functions in time series clustering process. In this way, the step of choosing the best distance function before starting time series clustering procedure has been deleted and different similarity measures can participate in the clustering process with different impacts. The objective function in this study is defined based on Fuzzy C-Means clustering objective function and the particle Swarm Optimization algorithm is used to find the optimal value for the considered objective function. Finally, by considering three distance functions including Euclidean distance, dynamic time warping and Pearson correlation coefficients the proposed method was implemented on seven well-known UCR time series datasets. Also, by considering the average normalized mutual information as a criterion for evaluating the performance of methods in this research, the proposed method was compared with five other methods. The results of this comparison indicated that the method presented in this study performed better in more than 85% of cases rather than other methods. In order to have a better evaluation, Tukey’s multiple comparison tests with a threshold of p < 0.05 is used with the ability of comparing the methods in pairs. The results obtained by Tukey test showed that, in about 83% of cases, the difference between achieved results by the proposed method in this study and results obtained by the other five techniques are statistically significant. Overall, the results of this study clearly showed the superiority of the proposed clustering method in the production of high quality clusters in comparison to some other methods.

سال انتشار :

1399

عنوان نشريه :

علوم و فنون نقشه برداري

فايل PDF :

8451428

لينک به اين مدرک :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=8&DC=1234842