عنوان مقاله :
مدل سازي آماري شوري خاك در پهنه هاي گسترده
عنوان به زبان ديگر :
Statistical Modeling of Soil Salinity on Large Scale
پديد آورندگان :
هاشمي نژاد، يوسف , همايي، مهدي دانشگاه تربيت مدرس - گروه آبياري و زهكشي , نوروزي، علي اكبر سازمان تحقيقات آموزش و ترويج كشاورزي
كليدواژه :
اعتبارسنجي , رگرسيون خطي چند متغيره , رگرسيون حدقل مربعات جزئي , سنجش از دور , شاخص هاي طيفي
چكيده فارسي :
شور شدن خاك ها در جهان به گونه اي روزافزون روبه گسترش است و درنتيجه توليد محصولات كشاورزي در مواجهه با اين تنش كاهش مي يابد. سياست گذاران و تصميم سازان در راستاي برنامه ريزي براي تطبيق با تغييرات اقليمي و افزايش نياز به غذا نيازمند پايش كمي مستمر شوري خاك مي-باشند. شاخص هاي طيفي حاصل از سنجنده هاي ماهواره اي و يا سنجنده هاي نزديك به سطح زمين به طور روزافزوني براي پايش شوري خاك مورداستفاده قرار مي گيرند به نحوي كه تا كنون تعداد زيادي شاخص براي پايش شوري خاك معرفي شده اند. براي مدل سازي و سنجش اعتبار مدل حاصله روش هاي رگرسيوني مختلفي مورداستفاده قرار گرفته كه مهم ترين آن ها رگرسيون خطي چندگانه (شامل رگرسيون گام به گام، انتخاب رو به جلو و حذف رو به عقب) و رگرسيون حداقل مربعات جزئي است. در اين پژوهش به منظور ارزيابي اين دو روش در مدل سازي تغييرات شوري خاك از اندازه-گيري هاي آزمايشگاهي و الكترومغناطيسي شوري خاك مربوط به 97 نقطه در سال 1392 و 225 نقطه در سال 1393 در بخشي از دشت سبزوار- داورزن به مساحت حدود 50 هزار هكتار استفاده شد. تعداد 23 شاخص طيفي از تصاوير ماهواره لندست 8 مربوط به تاريخ هاي نمونه برداري استخراج و به همراه مدل رقومي ارتفاع به عنوان متغير مستقل مورداستفاده قرار گرفت. روش هاي مختلف رگرسيون خطي چندمتغيره با استفاده از داده هاي سال اول به عنوان آموزش و سال دوم به عنوان آزمون و بالعكس هرچند ضريب تبيين بين حدود 22 تا 88 درصد ايجاد كرد، ولي اين همبستگي در دسته اعتبار سنجي از 29 درصد تجاوز نكرد. به علت وجود هم راستايي خطي چندگانه در بين متغيرهاي مستقل روش رگرسيون خطي چندگانه براي تمام متغير ها قابل كاربرد نبود. حذف متغيرهاي داراي هم راستايي خطي، تبديل لگاريتمي و تصادفي كردن كل داده ها در دو دسته آموزش و آزمون، ضريب رگرسيون مدل و اعتبار آن را به طور قابل قبولي افزايش داد. استفاده از رگرسيون حداقل مربعات جزئي با استفاده از داده هاي اصلي و تبديل لگاريتمي شده سال اول و دوم به عنوان آموزش و آزمون و بالعكس نيز در دسته آموزش ضريب تبيين بين 39 تا 85 درصد ايجاد كرد، ولي از برآورد در دسته آزمون ناتوان بود. تصادفي كردن داده ها و تقسيم مجدد آن ها به دو دسته آموزش و آزمون موجب ارتقاي چشمگير ضريب تعيين در دسته اعتبارسنجي شد. تكرار عمليات تصادفي كردن نشان داد كه روش از ثبات لازم براي برآورد ضرايب متغيرها برخوردار است.
چكيده لاتين :
Introduction: Soil salinization is increasing across developing world countries and agricultural production is decreasing as a result of this stress. Climate change could adversely affect soil salinization trend through the decrease in rainfall and increased evapotranspiration in arid regions. Policy and decision makers require continuous and quantitative monitoring of soil salinity to adapt with the adverse effects of climate change and increasing need for food. Indices derived from near surface or satellite based sensors are increasingly applied for monitoring of soil salinity so a considerable number of these indices are introduced already for soil salinity monitoring. Different regression methods have been already used for modeling and verification of developed models amongst them multiple linear regression (including stepwise, forward selection and backward elimination) and partial least square regression are the most important methods.
Materials and Methods: To evaluate different approaches for modeling soil salinity against remotely sensed data, an area of about 50000 ha was selected in Sabzevar- Davarzan plain during 2013 and 2014 years. The locations of sampling points were determined using Latin Hypercube Sampling (LHS) strategy. Sampling density was 97 points for 2013 and 25 points for 2014. All points were sampled down to 90 cm depth in 30 cm increments. Totally 366 soil samples were analyzed in the laboratory for electrical conductivity of saturated extract. Electromagnetic induction device (EM38) was also used to measure bulk soil electrical conductivity for the sampling points at the first year and sampling points and 8 points around it at the second year. Totally 97 and 225 EM measurements were also recorded for first and second years respectively. Mean measured soil EC data were calibrated against the EM measurements. Finding the fair correlations, the EM and EC data could be converted to each other. 23 spectral indices derived from Landsat 8 images in the sampling dates along with DEM were used as independent variables. Multiple Linear Regression (MLR) and Partial Least Square Regression (PLSR) methods were evaluated for their fitness in predicting soil salinity from independent variables in different calibration and verification datasets.
Results and Discussion: Different multiple linear regression approaches using the first year data for training and second year data for testing the models and vice versa were evaluated which produced determination coefficients of about 22 to 88 percent in the training dataset but this regression did not reach to 29 percent in the test dataset. Due to the multiple co-linearity amongst the independent variables the multiple linear regression methods were not applicable to all variables. Excluding the co-linear variables, log- transforming and randomizing them into train and test datasets improved the determination coefficient of model and its validation at an acceptable level. Application of partial least square regression using the original and log- transformed data of first and second years as train and test datasets and vice versa introduced determination coefficients of about 39 to 85 percent in the training dataset but were not able to predict in the test dataset. Random dividing of all data into train and test datasets considerably increased the determination coefficient in the verification dataset. Repeating the randomization showed that the approach has the required consistency for predicting the coefficients of variables.
Conclusions: Wide range of independent variable could be used for predicting soil salinity from remotely sensed data and indices. On the other hand the independent variables generally show multi-colinearity amongst themselves. Correlation matrix, variance inflation factor and tolerance indices could be used to identify multi-colinearity. Removing or scaling the variable with high colinearity could improve the regression. Different data transformation methods including log- transformation could also significantly improve the strength of regression. In this research EM data showed more significant correlations with spectral indices in comparison with laboratorial measured EC data. As the EM38 device measures the reflectance in special range of spectrum this higher correlation could be expected. Such models should be calibrated and verified against ground truth data. Generally a part of data set is used for calibrating (making the model) and the remained for verifying (testing the model). Random dividing of the total data of 2 years into calibration (2/3 of data) and verification (1/3 of data) could significantly improve the regression in the verification data set. This procedure increases the range of variability for data used for calibration and verification and prevents outlier predictions.