عنوان مقاله :
بررسي استفاده از خوشه بندي جهت كاهش زمان پرس و جوهاي تجميع رستري داخل پايگاه داده مكاني مطالعه موردي: رسترهاي بارش
عنوان به زبان ديگر :
Investigation on using clustering to reduce In-Database Sum query execution time for spatial rasters A case study for precipitation raster
پديد آورندگان :
سديدي، جواد دانشگاه خوارزمي تهران - دانشكده علوم جغرافيايي - گروه سنجش از دور و سيستم اطلاعات جغرافيايي , صاحبي وايقان، سعيده دانشگاه خوارزمي تهران - دانشكده علوم جغرافيايي - گروه سنجش از دور و سيستم اطلاعات جغرافيايي , رضائيان، هاني دانشگاه خوارزمي تهران - دانشكده علوم جغرافيايي - گروه سنجش از دور و سيستم اطلاعات جغرافيايي
اطلاعات موجودي :
فصلنامه سال 1396 شماره 103
كليدواژه :
بهينه سازي تجميع , پردازش تقريبي پرس و جو , پردازش In - Database , آناليز رستري , Sum , رسترهاي بارش
چكيده فارسي :
در سال هاي اخير با پيشرفت فن آوري هاي جمع آوري و مديريت داده، پايگاه داده هاي بسيار بزرگ پديدار شده اند. بسياري از پرس وجوهاي تجزيه و تحليل بر اساس ماهيتشان به تجميع و خلاصه سازي بخش هاي بزرگي از داده هاي در حال تجزيه و تحليل نياز دارند. مسئله اصلي در حيطه ي پايگاه داده پردازش كارآمد پرس وجو مخصوصاً در سيستم هاي لحظه اي است كه نيازمند رسيدن به جواب آني مي باشد تا اينكه كاربر زمان زيادي را براي دريافت پاسخ صرف نكند. (AQP (Approximate Query Processingبه عنوان روشي جايگزين براي پردازش پرس وجو در محيط هايي كه ارائه يك پاسخ دقيق زمان بر است، با هدف ارائه پاسخ تخميني، كاهش زمان پاسخ را با حذف يا كاهش تعداد دسترسي ها به داده ي پايه ميسر مي سازد. پردازش [2]In-Database عملكرد شبكه هاي كامپيوتري را بهبود بخشيده و به طراحي مناسب پرس وجو ها با نتايج نسبتاً سريع و دقيق كمك مي كند.
در اين پژوهش عمليات تجميع (Sum) در پايگاه داده PostgreSQL روي داده هاي رستري بارش به دو روش معمولي و بهينه پيشنهاد شده، انجام شده است. بررسي نتايج نشان مي دهد كه سرعت اجراي تابع Sum با خوشه بندي، 27/2 برابر اجراي اين تابع بدون خوشه بندي است و ميانگين اختلاف عددي پيكسل هاي حاصل از اجراي تابع Sum بهينه با اجراي تابع معمولي آن 0/028 مي باشد.ميانگين زمان اجراي پرس وجوهاي معمولي و بهينه براي تابع Sum به ترتيب 211 و7/754 ثانيه مي باشد كه نشانگر كارآمد بودن روش پيشنهاد شده در اين تحقيق مي باشد. نتايج تحقيق حاضر كه در حقيقت كاهش معني دار زمان پاسخ آناليزهاي داخل پايگاه داده اي در داده هاي رستري مي باشد، مي تواند در ارائه سرويس هاي رئال تايم تحت وب مانند هواشناسي، ترافيك و ... كه نيازمند تحليل هاي آني و جواب لحظه اي مي باشند مورد استفاده قرار گيرد.
چكيده لاتين :
Introduction: During the recent years، advances in data collection and management technology، have led to create very large databases. In contrast to other data such as numbers and strings، raster data are considered as complicated and contain special characteristics so that they are classified as “big data”. Due to the nature of spatial analysis queries، the need arises to aggregate or Summarize a large portions of the data to be analyzed. The main issue in the database era is the efficient query processing so that users do not spend long time for retrieving the requests. Traditional query processes return exact answers، however، the answers take more time than what is needed in real time systems. It notable that sometimes the query running time is much more important rather than the accuracy، specially، in real time services.
AQP (Approximate Query Processing) is an alternative method for query processing in time – consuming environments that enables the system to provide fast approximated answers. One of the most significant applications of AQP is query optimization. AQP may play a valuable role to increase the speed of spatial queries facing robust and complicated data. It is also an efficient method for recognizing the needed data and subsequently، minimizing the cost of aggregation queries. Since 1980s، utilizing the approximation methods have been initiated for decision support systems. Also، AQP has been noticed to address some problems in database era during the past decade. The current technics in various research frontiers are only useful for relational database systems (Azevedo، et al.، 2007(. The main idea behind in-database processing is elimination of big data sets transmission to disjointed programs. Since، in-database processing that all analysis are implemented into database، it offers fast implementation، scalability and security. Hence، In-Database processing improves the computer network productivity and participates in well-suited designing of fast response queries.
Methodology: The current research aims to compare traditional and optimized Sum aggregation operation to decrease the running time of spatial queries into PostgreSQL database. To undertake the research، 60 precipitation rasters have been used. The study area is located in Lorestan province and precipitation gauging stations were used as primary data. Rasters data have been created from monthly precipitation data for the period of 2010-2014 using Krigging interpolation method and entered into PostgreSQL database using Raster2pgSQl extension. In the following، rasters pixels are stored into their related tables. In optimized aggregation method، firstly، raster data are clustered by the written similarity function. The used functions have been written by PL/pgSQL language in PostGIS. The execution steps of Sum function are as the following: creating the similarity function، performing the function، running the optimized query and consequently، resulting the approximated query respectively.
Subsequently، one raster is selected from each cluster and it is multiplied by the number of rasters belong to the given cluster. The resulted raster is entered to Sum function as the representative of the cluster. In each cluster، the number of implemented arithmetic operations is reduced as the following formula: (number of rasters in the cluster-1) *rows*columns of the given raster). Using the mentioned method، the number of arithmetic operations are significantly reduced and prepares the fast approximate answers. Finally، for accuracy assessment، the error of each method was approximated by calculating mean relative error، DI (difference indicator) error and relative error for each raster. Finally، the achieved results were analyzed.
It is mentionable that the user may make a decision whether the resulted accuracy is acceptable for a particular project or an exact query has to be executed.
Results and discussion: In this research to compare the traditional and optimized Sum function، five scenarios have been implemented. The results show that the optimized Sum function is 27.2 times faster than the traditional function. The average difference of pixel values between the traditional and optimized one is 0.028. Consequently، the query running time for the optimized and traditional Sum is 7.754 and 211 seconds respectively، which implies the efficiency of the used method (optimized Sum).
It is notable that the accuracy of the optimized method depends on the nature and homogeneity or heterogeneity of the used rasters.
The valuable decreasing of the in-database spatial query running time may be used to offer real time web-based services such as meteorology، traffic and etc.، which need to real time analysis and fast retrieving responses.
عنوان نشريه :
اطلاعات جغرافيايي سپهر
عنوان نشريه :
اطلاعات جغرافيايي سپهر
اطلاعات موجودي :
فصلنامه با شماره پیاپی 103 سال 1396