首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
The selection of a reliable inference model is a crucial step in developing ecologically sound reconstructions of environmental variables in the past. We compared intra- and inter-regional regression-based models, and an inter-regional Modern Analogue Technique (MAT) model in their ability to infer lakewater pH from scaled chrysophyte assemblages. The performance of each model was assessed by examining cross-validated coefficients of determination and prediction errors, and through reconstructing the pH of 50 modern and fossil samples in south-central Ontario, Canada. Using the intra- and inter-regional data sets, we found little difference in the ability of the regression-based models to infer present-day pH. Partial Least Squares (PLS) regression, Weighted Averaging (WA), and Weighted Averaging Partial Least Squares (WA-PLS) inference models showed similar values for jack-knifed coefficients of determination (r2 jack), root mean squared errors of prediction (RMSEPjack), and mean and maximum biases. Based on an analogue matching approach, the inferred values from 48 fossil sediment samples suggested that the intra-regional model did not provide reliable reconstructions for approximately half of the fossil samples. However, inferences from the inter-regional MAT and regression-based models were found to have appropriate analogues and thus considered to be more reliable.  相似文献   

2.
The methods PARAFAC and three-way PLS are compared with respect to their ability to predictreversed-phase retention values.Special attention is paid to simple validatory tools,the meaning and useof which are explained.The simple validatory tools consist of percentages of explained variation in the training set and thosethat can be calculated with the use of markers.These markers are special(reference)solutes,the retentionvalues of which are used to gain information about a new object for which predictions are wanted.Different validatory tools can be calculated with the use of these marker retention values:percentagesof used variation and mean sum of squared residuals after applying the model to these marker retentionvalues.The validatory tools are evaluated on their power to estimate their test set counterparts:thepercentages of explained variation in the test set and mean sum of squared prediction errors in the test set.Two different data sets from reversed-phase chromatography are used to evaluate the validatory tools.The first data set has a high signal-to-noise ratio and is measured under the same measurementconditions.The second data set has a low signal-to-noise ratio and is measured under differentmeasurement conditions.Some of the simple validatory tools seem to have relevance to their test setcounterparts,even in the case of the second data set.  相似文献   

3.
In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and in-fluencing factors demonstrate the land use character of rural industrialization and urbaniza-tion in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.  相似文献   

4.
In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.  相似文献   

5.
This paper compares two land change models in terms of appropriateness for various applications and predictive power. Cellular Automata Markov (CA_Markov) and Geomod are the two models, which have similar options to allow for specification of the predicted quantity and location of land categories. The most important structural difference is that CA_Markov has the ability to predict any transition among any number of categories, while Geomod predicts only a one‐way transition from one category to one alternative category.

To assess the predictive power, each model is run several times to predict land change in central Massachusetts, USA. The models are calibrated with information from 1971 to 1985, and then the models predict the change from 1985 to 1999. The method to measure the predictive power: 1) separates the calibration process from the validation process, 2) assesses the accuracy at multiple resolutions, and 3) compares the predictive model vis‐à‐vis a null model that predicts pure persistence. Among 24 model runs, the predictive models are more accurate than the null model at resolutions coarser than two kilometres, but not at resolutions finer than one kilometre. The choice of the options account for more variation in accuracy of runs than the choice of the model per se. The most accurate model runs are those that did not use spatial contiguity explicitly. For this particular study area, the added complexity of CA_Markov is of no benefit.  相似文献   

6.
The standard deviation of prediction errors(SDEP)is used to evaluate and compare the predictive abilityof some regression models,namely MLR,ACE and linear and non-linear PLS,the last being the bestone.The parameter is determined by a cross-validation approach as an average of several runs obtainedon forming groups in a random way.The variation in SDEP with the number of latent variables in PLSis also discussed.  相似文献   

7.
基于高光谱的民勤土壤盐分定量分析   总被引:2,自引:0,他引:2  
庞国锦  王涛  孙家欢  李森 《中国沙漠》2014,34(4):1073-1079
土壤盐渍化是重要的生态环境问题,严重影响着干旱、半干旱区的农牧业及经济发展。高光谱遥感技术能够提供地物的连续光谱信息,易于分析细微差别,在定量研究土壤盐分含量方面具有较大优势。民勤县位于甘肃省石羊河流域下游,水力资源匮乏,盐渍化问题十分严峻。本研究基于实验室光谱数据,通过建立模型定量分析土壤盐分含量。首先对原始数据进行连续统去除(cn)预处理,然后分别建立了土壤盐分含量的高光谱指数模型(NDSI)、偏最小二乘回归模型(PLS)、间隔偏最小二乘法模型(iPLS)和反向间隔偏最小二乘法模型(BiPLS),考察各种模型对土壤盐分的预测能力。对比分析发现,使用全部波段信息建模的PLS模型优于仅使用两个波段信息的NDSI模型,而iPLS和BiPLS模型通过选择特征波段进行建模,结果均好于全谱PLS模型。其中,BiPLS模型波段选择的能力优于iPLS模型,得出的模型结果最好,预测相对偏差RPD达到2.02,决定系数R2和模拟值与预测值线性回归的斜率分别为0.76和0.92,模型可以近似地预测土壤盐分含量。结果说明特征波段选择方法能够从大量数据中提取有效信息,简化模型,并获取比NDSI模型和全谱PLS模型更优的预测结果。这些研究对于使用高光谱数据定量分析土壤盐渍化有一定的意义。  相似文献   

8.
A procedure called GOLPE is suggested in order to detect those variables which increase the predictivityof PLS models.The procedure is based on evaluating the predictive power of a number of PLS modelsbuilt by different combinations of variables selected according to a factorial design strategy.Examplesare given of the efficiency of this variable selection procedure,which shows how these predictive PLSmodels are better than those obtained by all variables and better than the corresponding ordinaryregression models.  相似文献   

9.
The diatom composition in surface sediments from 119 northern Swedish lakes was studied to examine the relationship with lake-water pH, alkalinity, and colour. Diatom-based predictive models, using weighted-averaging (WA) regression and calibration, partial least squares (PLS) regression and calibration, and weighted-averaging partial least squares (WA-PLS) regression and calibration, were developed for inferences of water chemistry conditions. The non-linear response between the diatom assemblages and pH and alkalinity was best modelled by weighted-averaging methods. The lowest prediction error for pH was obtained using weighted averaging, with or without tolerance downweighting. For alkalinity there was still some information in the residual structure after extracting the first weighted-averaging component, which resulted in a slight improvement of predictions when using a two component WA-PLS model. The best colour predictions were obtained using a two component PLS model. Principal component analysis (PCA) of the prediction errors, with some characteristics of the training set included as passive variables, was performed to compare the results for the different alkalinity predictive models. The results indicate that calibration techniques utilizing more than one component (PLS and WA-PLS) can improve the predictions for lakes with diatom taxa that have broad tolerances. Furthermore, we show that WA-PLS performs best compared with the other techniques for those lakes that have a high relative abundance of the most dominant taxa and a corresponding low sample heterogeneity.  相似文献   

10.
Annual variations in Russian wheat production are considerable and result mainly from changes in yields. Yield variation results from a linear time trend, generally taken to be the result of technology, and from climatic fluctuations. Using time and a limited set of climatic data from the growing season it is possible to model wheat yields and production using multiple linear regression techniques. Testing the model on recent data shows an encouraging predictive ability.  相似文献   

11.
Effects of sample size on the accuracy of geomorphological models   总被引:1,自引:1,他引:0  
Commonly, the most costly part of geomorphological distribution modelling studies is gathering the data. Thus, guidance for researchers concerning the quantity of field data needed would be extremely practical. This paper scrutinises the relationship between the sample size (the number of observations varied from 20 to 600) and the predictive ability of the generalized linear model (GLM), generalized additive model (GAM), generalized boosting method (GBM) and artificial neural network (ANN) in two data settings, i.e., independent and split-sample approaches. The study was performed using empirical data of periglacial processes from an area of 600 km2 in northernmost Finland at grid resolutions of 1 ha (100 × 100 m) and 25 ha (500 × 500 m). A rather sharp increase in the predictive ability of the models was observed when the number of observations increased from 20 to 100, and the level of robust predictions was reached with 200 observations. The result indicates that no more than a few hundred observations are needed in geomorphological distribution modelling at a medium scale resolution (ca. 0.01–1 km2).  相似文献   

12.
《Polar Science》2014,8(3):242-254
In this paper we examine 2- and 3-way chemometric methods for analysis of Arctic and Antarctic water samples. Standard CTD (conductivity–temperature–depth) sensor devices were used during two oceanographic expeditions (July 2007 in the Arctic; February 2009 in the Antarctic) covering a total of 174 locations. The output from these devices can be arranged in a 3-way data structure (according to sea water depth, measured variables, and geographical location). We used and compared 2- and 3-way statistical tools including PCA, PARAFAC, PLS, and N-PLS for exploratory analysis, spatial patterns discovery and calibration. Particular importance was given to the correlation and possible prediction of fluorescence from other physical variables. MATLAB's mapping toolbox was used for geo-referencing and visualization of the results. We conclude that: 1) PCA and PARAFAC models were able to describe data in a satisfactory way, but PARAFAC results were easier to interpret; 2) applying a 2-way model to 3-way data raises the risk of flattening the covariance structure of the data and losing information; 3) the distinction between Arctic and Antarctic seas was revealed mostly by PC1, relating to the physico-chemical properties of the water samples; and 4) we confirm the ability to predict fluorescence values from physical measurements when the 3-way data structure is used in N-way PLS regression.  相似文献   

13.
For the calibration of chromatographic systems,different methods can be used.One class of methodsutilizes three-way approaches.The calibration problem is stated in such a way that the decompositionof a three-way array can serve for the prediction of retention on new stationary phases.Two three-way approaches are presented:the Unfold-PCA and PARAFAC models.The theory ofboth methods is presented and the differences are highlighted,the main difference being that PARAFACis a trilinear decomposition whereas Unfold-PCA is not.Both three-way methods are evaluated on asmall data set consisting of retention measurements of eight solutes at six mobile phase compositions onsix stationary phases.The differences in performance of the two models are minor,For calibration purposes,two variants of the methods are discussed:three-way PLS and an extensionof PARAFAC.Again the theory and differences between the two methods are explained.The predictiveperformance of the two methods is compared using the same data set as earlier.The differences inpredictive performance,however,are minor.Both methods are capable of predicting 98% of thevariation in the test sets.Yet,there are other considerations when comparing methods than predictiveperformance,e.g.the quality of the predictions.  相似文献   

14.
A ROBUST PLS PROCEDURE   总被引:1,自引:0,他引:1  
A robust partial least squares(PLS)regression algorithm is developed.This is achieved by substitutionof the univariate regression steps in the iterative PLS2 algorithm by a robust alternative.The anglebetween loading vectors from both perturbed and unperturbed solutions is used as a measure ofrobustness.By means of a perturbation study on a structure-activity data set,it is demonstrated thatthe stability of the robust method is superior to standard PLS.  相似文献   

15.
西北地区气候因素对沙尘暴影响的模型研究   总被引:6,自引:3,他引:3  
李智勇 《中国沙漠》2009,29(3):415-420
气候因素是沙尘暴形成的必要条件之一,建立适合于西北地区衡量气候因素对沙尘暴影响的定量模型极为重要。在前人的沙尘天气模型基础上,强调在西北地区利用温度、降水量、蒸发量综合考虑地区水分均衡来计算干燥指数,并与大风日数和风速计算的风速影响指数共同建立新的沙尘暴气候影响指数模型。将模型实际应用于新疆、青海、甘肃、陕西、宁夏和内蒙古等西北6省区,利用6省区1961—1980年的气象资料进行回归分析,结果表明,气候影响指数D和沙尘暴日数S之间具有良好的相关性。并选取1981—2005年陕西、甘肃和内蒙古3地气象资料计算的沙尘暴日数预测值与3省区的实际值作比较,发现模型拟合程度较好,且在揭示气候影响因子对沙尘暴影响作用方面,效果较为显著。  相似文献   

16.
We developed an inference model to infer dissolved organic carbon (DOC) in lakewater from lake sediments using visible-near-infrared spectroscopy (VNIRS). The inference model used surface sediment samples collected from 160 Arctic Canada lakes, covering broad latitudinal (60–83°N), longitudinal (71–138°W) and environmental gradients, with a DOC range of 0.6–39.6 mg L−1. The model was applied to Holocene lake sediment cores from Sweden and Canada and our inferences are compared to results from previous multiproxy paleolimnological investigations at these two sites. The inferred Swedish and Canadian DOC profiles are compared, respectively, to inferences from a Swedish-based VNIRS-total organic carbon (TOC) model and a Canadian-based diatom-inferred (Di-DOC) model from the same sediment records. The 5-component Partial Least Squares (PLS) model yields a cross-validated (CV) RCV2 R_{CV}^{2}  = 0.61 and a root mean squared error of prediction (RMSEP CV ) = 4.4 mg L−1 (11% of DOC gradient). The trends inferred for the two lakes were remarkably similar to the VNIRS-TOC and the Di-DOC inferred profiles and consistent with the other paleolimnological proxies, although absolute values differed. Differences in the calibration set gradients and lack of analogous VNIRS signatures in the modern datasets may explain this discrepancy. Our results corroborate previous geographically independent studies on the potential of using VNIRS to reconstruct past trends in lakewater DOC concentrations rapidly.  相似文献   

17.
PLS1 regression is generally viewed as lying in between PCR and OLS regression.Proof is given thatthe coefficient of determination,R~2,for a PLS multivariate calibration model is at least as high as thatfor a PCR model with the same number of components.It appears that PLS can be linked to acorrelation-weighted polynomial regression of a constant response on the eigenvalues of the covariancematrix of the predictor variables.  相似文献   

18.
WHICH PRINCIPAL COMPONENTS TO UTILIZE FOR PRINCIPAL COMPONENT REGRESSION   总被引:1,自引:0,他引:1  
Principal components(PCs)for principal component regression(PCR)have historically been selectedfrom the top down for a reliable predictive model.That is,the PCs are arranged in a list starting withthe most informative(PC associated with the largest singular value)and proceeding to the leastinformative(PC associated with the smallest singular value).PCs are then chosen starting at the top ofthis list.This paper discusses an alternative procedure of treating PC selection as an optimization prob-lem.Specifically,without any regard to the ordering,the optimal subset of PCs for an acceptablepredictive model is desired.Five data sets are analyzed using the conventional and alternative approaches.Two data sets are spectroscopic in nature,two data sets deal with quantitative structure-activityrelationships(QSARs)and one data set is concerned with modeling.All five data sets confirm thatselection of a subset without consideration to order secures the best results with PCR.One data set isalso compared using partial least squares 1.  相似文献   

19.
根据澜沧江下游流域西双版纳州1989~2006年渔业资源量数据,分析渔业资源量变动特征,结合1989~2003年同期水文数据,应用人工神经网络技术分析环境因子与渔业捕捞量的关联,并依据2006年数据进行预测检验。结果表明,自1989~2006年,西双版纳州水产品产量逐年增加,养殖面积扩大、养殖技术发展是主要原因。渔业捕捞量总体呈现上升趋势,但在1990~1991、1998~1999及2001~2003年有所下降,与渔政管理和执法加强相关。气温和年降水量对渔业捕捞量的影响最大,其次是最高水温、最低含沙量和最低径流量。人工神经网络下渔业资源模拟值与实际值相关性高,2006年预测值与实际值的相对误差为9.7%,模型预测效果好。  相似文献   

20.
以星载高光谱影像Hyperion为数据源,系统比较了NDVI与偏最小二乘回归(PLS)估测荒漠化地区植被覆盖度的能力,模型的建立(n=46)与独立检验所用样本(n=10)均为地面实测数据。研究结果表明,基于星载高光谱数据的NDVI与PLS模型可以有效地估测荒漠化地区植被覆盖度。相比于宽波段NDVI(RMSEP=10.5618)及基于803.3/671.02 nm计算的标准高光谱NDVI(RMSEP=8.3863),选择特定高光谱波段(823.65/701.55 nm)构建的NDVI预测植被覆盖度的误差明显较低(RMSEP=6.5189)。基于高光谱所有波段原始反射率、一阶导数及包络线去除光谱的PLS回归模型表现,要明显优于仅利用两个波段信息的NDVI,其中基于原始反射率的PLS回归模型表现最佳,RMSEP为4.4998,约为因变量平均值的23%。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号