كليدواژه :
دادهكاوي سريهاي زماني , خوشهبندي , نزديكترين همسايگي , طولانيترين زيردنبالهي مشترك , چرخش زماني پويا
چكيده فارسي :
تكنيكهاي دادهكاوي بهطور خاص براي دادههاي ثابت طراحي شدهاند. لذا بهكارگيري آنها براي دادههاي سري زماني نيازمند اعمال تغييراتي(روش اندازهگيري شباهت) است. براساس تحقيقات اخير، روشهاي طولانيترين زيردنبالهي مشترك و چرخش زماني پويا، از پركاربردترين و كاراترين اين روشها محسوب ميشود. در اين تحقيق، قصد داريم تا عملكرد اين روشها را در تكنيكهاي نزديكترين همسايگي و خوشهبندي كامدويد مورد ارزيابي و مقايسه قرار داده تا بتوان از آنها با دقت بهتري در اين
تكنيكها و در مسائلي نظير قسمتبندي مشتريان، زمانبندي كارگاه و ... استفاده كرد. به همين منظور از 63 مجموعه داده سري زماني از بانك اطلاعاتي UCR، استفاده ميشود. نتايج نشان ميدهد كه تأثيرآنها در دقت تشخيص درست دستهي سري زماني و دقت خوشهبندي، بهطور معناداري تفاوت دارد، ولي تأثير آنها در تعيين تعداد خوشه و نمايندهي خوشه، تفاوت معناداري ندارد.
چكيده لاتين :
Today, the use of data mining techniques such as classification, clustering, discover repetitive pattern and discover outliers in different domains
including production, medicine, social, meteorology, stock exchange, sales, customer service and other areas are increasing. Data mining techniques are specifically designed for static data. Therefore, their use for time series data requires some modifications to their respective algorithms. One of these changes is the selection of the appropriate similarity measurement method, because similarity measurement methods are used in all data mining techniques. Therefore, in this research, we will evaluate and compare the effect of two commonly used and efficient methods of time series similarity measurement in data mining. This evaluation is done in relation to the effectiveness of these
methods in achieving better results. These methods are the Longest Common Sub Sequence (LCSS) method and the Dynamic time Warping (DTW) method. The main purpose of this research is to compare the performance of these methods in time series data mining. The data mining techniques that used in this research are the nearest-neighbor technique and k-medoids clustering algorithm. The performance evaluation process is described in the text. This process uses the nearest-neighbor technique to calculate the accuracy of detection of right time
series class, and uses the k-medoids clustering technique to calculate the clustering accuracy, the ability to correctly determine the number of clusters, and the ability to determine the better cluster representative. For this purpose, we use 63 time series data sets by random from a world-renowned database that named UCR collection. The results show that the effect of LCSS method is significantly better than the effect of DTW method on the correct detection accuracy of time series class and clustering accuracy by 99% and 92.5% confidence, respectively, but there is no significant difference between them in terms of their effect in determining the number of clusters and cluster representatives. The results of this research help to use these methods in appropriate data mining techniques in issues such as customer segmentation, workshop scheduling and the like more accurately.