پديد آورندگان :
مويدي، علي دانشگاه تهران - پرديس دانشكده هاي فني - دانشكده مهندسي نقشه برداري و اطلاعات مكاني , عباسپور، رحيم علي دانشگاه تهران - پرديس دانشكده هاي فني - دانشكده مهندسي نقشه برداري و اطلاعات مكاني , چهرقان، عليرضا دانشگاه صنعتي سهند - دانشكده مهندسي معدن
كليدواژه :
خطوط سير مكاني , زمان محاسباتي , خوشه بندي , شاخص سيلووت
چكيده فارسي :
در سالهاي اخير، رشد بالا و روزافزون داده هاي خطوط سير مكاني و لزوم پردازش و استخراج اطلاعات مفيد و الگوهاي معني دار از آن ها منجر به جلب توجه محققان بسياري در زمينه خوشه بندي خطوط سير مكاني-زماني شده است. تاكنون توابع شباهت و الگوريتم هاي خوشه بندي مختلفي براي طبقه بندي خطوط سير ارائه شده اند. گستردگي الگوريتم هاي خوشه بندي و نتايج منحصر به فرد هر يك بر لزوم توجه و بررسي نقاط ضعف و قوت آن ها تاكيد مي كند. در اين تحقيق، الگوريتم هاي خوشه بندي در خطوط سير مكاني كه تعميم يافته از الگوريتم هاي خوشه بندي داده هاي نقطه اي هستند به چهار دسته ي كلي روش هاي افرازي، سلسله مراتبي، چگالي مبنا و مبتني بر بهينه سازي تقسيم شدند و پركاربردترين الگوريتم ها در هر دسته پياده سازي و مورد ارزيابي قرار گرفتند. فرايند ارزيابي بر روي دو مجموعه داده با پيچيدگي متفاوت و در سه حالت بدون خطا، خطا با توزيع گوسين و وجود داده پرت انجام گرفته تا توانايي روش ها در شرايط مختلف بررسي گردد. از شاخص سيلووت و زمان محاسباتي به عنوان دو پارامتر براي مقايسه و ارزيابي استفاده شده است. با توجه به نتايج به دست آمده توجه به داده و ويژگي هاي آن در انتخاب روش مناسب خوشه بندي حائز اهميت است. با اين حال در مجموع بهترين نتايج از لحاظ كيفيت خوشه بندي به ترتيب از دسته هاي مبتني بر بهينه سازي، افرازي، سلسله مراتبي و چگالي مبنا و از لحاظ سرعت محاسبات به ترتيب دسته هاي چگالي مبنا، سلسله مراتبي، افرازي و مبتني بر بهينه سازي حاصل شده است. دسته افرازي (صرفا زير دسته طيفي) بالاترين مقاومت در برابر داده پرت و روش هاي چگالي مبنا و مبتني بر بهينه سازي بالاترين مقاوت در برابر نويز را از خود نشان داده اند.
چكيده لاتين :
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information which is able to respond to different challenges in real-world applications such as traffic management and control, smart transportation, surveillance, security and biological studies. Clustering is one of the most important methods for trajectory pattern extraction, their volume reduction, discovering outliers in trajectories, indexing and their simple visualization. So far, different similarity functions and clustering algorithms have been proposed for trajectory clustering. The diversity of clustering algorithms and their unique results highlights the need for paying attention to their weaknesses and strengths. Some clustering algorithms are only effective on low volume datasets. There are also some algorithms which are only able to extract clusters with convex shape, whereas some of them extract clusters of any shapes. On the other hand, several clustering functions require the determination of the initial value, such as the number of clusters by the users while some others do not need initial inputs. In addition, outlier detection is not possible in all clustering algorithms. In this study, spatial trajectories clustering algorithms that are extended from point clustering algorithms is divided into four general categories: partitioning-based clustering, hierarchical clustering, optimization-based clustering and density-based clustering. Then, the most commonly used algorithms in each category are implemented and evaluated. The evaluation process is performed on two sets of data (cross and i5) with dissimilar complexity. The effect of noise and outliers is one of the most critical parameters engaged in the performance quality of clustering functions which is considered in this study. The Silhouette index and computational time are used as two parameters for comparison and evaluation. According to obtained results, it is crucial to consider the data, its features, and also the utilized distance function in order to decide on the proper clustering method. However, generally, the best results regarding the clustering quality are obtained from optimization-based clustering. With the integration of genetic algorithm into the K-means, all results in two cases of using both two datasets and using two different distance functions are improved. Using the genetic algorithm in K-means leads to finding the optimum location of cluster centers and dealing with the local minimum problem. It is important to note that high computational time is one of the weaknesses of optimization-based clustering. After the optimization-based clustering, regarding the clustering quality, partitioning-based, hierarchical and density-based clustering have achieved the second, third and fourth ranks respectively. With regard to the computational time, the best results are obtained from the density-based, hierarchical, partitioning-based and optimization-based clustering consecutively. Some methods such as K-means (a sub-category of partitioning-based clustering) are severely sensitive to outliers while spectral sub-category of partitioning-based clustering has a high resistance against them. Moreover, the density-based and optimization-based clustering methods have the highest tolerance against noise.