首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
A procedure called GOLPE is suggested in order to detect those variables which increase the predictivityof PLS models.The procedure is based on evaluating the predictive power of a number of PLS modelsbuilt by different combinations of variables selected according to a factorial design strategy.Examplesare given of the efficiency of this variable selection procedure,which shows how these predictive PLSmodels are better than those obtained by all variables and better than the corresponding ordinaryregression models.  相似文献   

2.
WHICH PRINCIPAL COMPONENTS TO UTILIZE FOR PRINCIPAL COMPONENT REGRESSION   总被引:1,自引:0,他引:1  
Principal components(PCs)for principal component regression(PCR)have historically been selectedfrom the top down for a reliable predictive model.That is,the PCs are arranged in a list starting withthe most informative(PC associated with the largest singular value)and proceeding to the leastinformative(PC associated with the smallest singular value).PCs are then chosen starting at the top ofthis list.This paper discusses an alternative procedure of treating PC selection as an optimization prob-lem.Specifically,without any regard to the ordering,the optimal subset of PCs for an acceptablepredictive model is desired.Five data sets are analyzed using the conventional and alternative approaches.Two data sets are spectroscopic in nature,two data sets deal with quantitative structure-activityrelationships(QSARs)and one data set is concerned with modeling.All five data sets confirm thatselection of a subset without consideration to order secures the best results with PCR.One data set isalso compared using partial least squares 1.  相似文献   

3.
New data technologies and modelling methods have gained more attention in the field of periglacial geomorphology during the last decade. In this paper we present a new modelling approach that integrates topographical, ground and remote sensing information in predictive geomorphological mapping using generalized additive modelling (GAM) . First, we explored the roles of different environmental variable groups in determining the occurrence of non‐sorted and sorted patterned ground in a fell region of 100 km2 at the resolution of 1 ha in northern Finland. Second, we compared the predictive accuracy of ground‐topography‐ and remote‐sensing‐based models. The results indicate that non‐sorted patterned ground is more common at lower altitudes where the ground moisture and vegetation abundance is relatively high, whereas sorted patterned ground is dominant at higher altitudes with relatively high slope angle and sparse vegetation cover. All modelling results were from good to excellent in model evaluation data using the area under the curve (AUC) values, derived from receiver operating characteristic (ROC) plots. Generally, models built with remotely sensed data were better than ground‐topography‐based models and combination of all environmental variables improved the predictive ability of the models. This paper confirms the potential utility of remote sensing information for modelling patterned ground distribution in subarctic landscapes.  相似文献   

4.
The goal of this research is to create a theoretical framework for the identification of cancer risk factor disparities and address the recognition of geographic patterns in these factors. 34 secondary variables covering the entire US at the county level in 2010 were analyzed, both individually and grouped (theoretically and statistically), in relation to the mortality to incidence ratio (MIR) for all cancer sites. An a priori assessment and a principal components analysis (PCA) were used to group variables to test societal constructs. OLS and geographically weighted regressions (GWRs) were used to assess influence of both individual and grouped variables against the MIR. The theoretical grouping of variables showed little change in predictive capability of OLS models. In GWR model, there was marked improvement over the OLS. Maps produced using local R2 showed clear regional patterns of influence between the indicators and the MIR. Both the theoretical model and the justification for a spatial approach to cancer risk factor disparities were shown to be effective in this paper. The link between this suite of indicators and the health outcomes is clear, and supports the idea that a full representation of the SES landscape should be used to both predict health outcomes and to assess policy options for improving these outcomes. With the presence of definitive regional patterns and clear connections between the MIR and societal groupings, the findings from this research suggest a need to shift to a more comprehensive and spatial approach to cancer disparities research.  相似文献   

5.
The aim of this study is to identify the predictive factors and variables that motivate decisions to supply sustainable or green commercial properties, and to apply discriminant analysis technique to assess if there are significant differences in perception between real estate developers in Malaysia and Nigeria based on the identified variables. The result revealed a significant discriminant function differentiating the two countries based on their perception of the variables. The motivational components and attributes were found to be in favor of Malaysia. The Wilks' lambda F‐test and the standardized discriminant function coefficients, showed that there were significant differences between developers in both countries as assessed by the life‐cycle cost motivations, green policies and certification, market strategy, developers expected rate of return, green tax incentive, and available green skills. However, the variables with the most predictive power in accounting for the differences were found to be within the measures of life‐cycle, cost‐saving motivations.  相似文献   

6.
This paper compares two land change models in terms of appropriateness for various applications and predictive power. Cellular Automata Markov (CA_Markov) and Geomod are the two models, which have similar options to allow for specification of the predicted quantity and location of land categories. The most important structural difference is that CA_Markov has the ability to predict any transition among any number of categories, while Geomod predicts only a one‐way transition from one category to one alternative category.

To assess the predictive power, each model is run several times to predict land change in central Massachusetts, USA. The models are calibrated with information from 1971 to 1985, and then the models predict the change from 1985 to 1999. The method to measure the predictive power: 1) separates the calibration process from the validation process, 2) assesses the accuracy at multiple resolutions, and 3) compares the predictive model vis‐à‐vis a null model that predicts pure persistence. Among 24 model runs, the predictive models are more accurate than the null model at resolutions coarser than two kilometres, but not at resolutions finer than one kilometre. The choice of the options account for more variation in accuracy of runs than the choice of the model per se. The most accurate model runs are those that did not use spatial contiguity explicitly. For this particular study area, the added complexity of CA_Markov is of no benefit.  相似文献   

7.
Effects of spatial autocorrelation (SAC), or spatial structure, have often been neglected in the conventional models of pedogeomorphological processes. Based on soil, vegetation, and topographic data collected in a coastal dunefield in western Korea, this research developed three soil moisture–landscape models, each incorporating SAC at fine, broad, and multiple scales, respectively, into a non-spatial ordinary least squares (OLS) model. All of these spatially explicit models showed better performance than the OLS model, as consistently indicated by R2, Akaike’s information criterion, and Moran’s I. In particular, the best model was proved to be the one using spatial eigenvector mapping, a technique that accounts for spatial structure at multiple scales simultaneously. After including SAC, predictor variables with greater inherent spatial structure underwent more reduction in their predictive power than those with less structure. This finding implies that the environmental variables pedogeomorphologists have perceived important in the conventional regression modeling may have a reduced predictive power in reality, in cases where they possess a significant amount of SAC. This research demonstrates that accounting for spatial structure not only helps to avoid the violation of statistical assumptions, but also allows a better understanding of dynamic soil hydrological processes occurring at different spatial scales.  相似文献   

8.
9.
When the number of variables exceeds the number of samples, one method of multivariate discriminationis to use principal components analysis to reduce the dimensionality and then to perform canonicalvariates analysis (PC-CVA). This paper proposes an alternative approach in which discriminant analysisis carried out by a weighted principal component analysis of the group means (DPCA). This method doesnot require prior data reduction and produces discriminant factors that are orthogonal in the original dataspace. The theory and performance of the two methods are compared. Although the individual factors ofDPCA are found to be less discriminating than PC-CVA, the overall discrimination, calculated bymultivariate analysis of variance, and the predictive value, estimated by the leaving-one-out error rate,are broadly comparable.  相似文献   

10.
Comparing models of debris-flow susceptibility in the alpine environment   总被引:12,自引:3,他引:9  
Debris-flows are widespread in Val di Fassa (Trento Province, Eastern Italian Alps) where they constitute one of the most dangerous gravity-induced surface processes. From a large set of environmental characteristics and a detailed inventory of debris flows, we developed five models to predict location of debris-flow source areas. The models differ in approach (statistical vs. physically-based) and type of terrain unit of reference (slope unit vs. grid cell). In the statistical models, a mix of several environmental factors classified areas with different debris-flow susceptibility; however, the factors that exert a strong discriminant power reduce to conditions of high slope-gradient, pasture or no vegetation cover, availability of detrital material, and active erosional processes. Since slope and land use are also used in the physically-based approach, all model results are largely controlled by the same leading variables.Overlaying susceptibility maps produced by the different methods (statistical vs. physically-based) for the same terrain unit of reference (grid cell) reveals a large difference, nearly 25% spatial mismatch. The spatial discrepancy exceeds 30% for susceptibility maps generated by the same method (discriminant analysis) but different terrain units (slope unit vs. grid cell). The size of the terrain unit also led to different susceptibility maps (almost 20% spatial mismatch). Maps based on different statistical tools (discriminant analysis vs. logistic regression) differed least (less than 10%). Hence, method and terrain unit proved to be equally important in mapping susceptibility.Model performance was evaluated from the percentages of terrain units that each model correctly classifies, the number of debris-flow falling within the area classified as unstable by each model, and through the metric of ROC curves. Although all techniques implemented yielded results essentially comparable; the discriminant model based on the partition of the study area into small slope units may constitute the most suitable approach to regional debris-flow assessment in the Alpine environment.  相似文献   

11.
Landslides can cause the formation of dams, but these dams often fail soon after lake formation. Thus, rapidly evaluating the stability of a landslide dam is crucial for effective hazard mitigation. This study utilizes discriminant analysis based on a Japanese dataset consisting of 43 well documented landslide dams to determine the significant variables, including log-transformed peak flow (or catchment area), and log-transformed dam height, width and length in hierarchical order, which affect the stability of a landslide dam. The high overall prediction power (88.4% of the 43 training cases are correctly classified) and the high cross-validation accuracy (86%) demonstrate the robustness of the proposed discriminant models PHWL (with variables including log-transformed peak flow, and log-transformed dam height, width and length) and AHWL (with variables including log-transformed catchment area, and log-transformed dam height, width and length). Compared to a previously proposed “DBI” index-based graphic approach, the discriminant model AHV – which uses the log-transformed catchment area, dam height, and dam volume as relevant variables – shows better ability to evaluate the stability of landslide dams. Although these discriminant models are established using a Japanese dataset only, the present multivariate statistical approach can be applied for an expanded dataset without any difficulty when more completely documented worldwide landslide-dam data are available.  相似文献   

12.
Urban multiple land use change (LUC) modelling enables the realistic simulation of LUC processes in complex urban systems; however, such modelling suffers from technical challenges posed by complicated transition rules and high spatial heterogeneity when predicting the LUC of a highly developed area. Tree-based methods are powerful tools for addressing this task, but their predictive capabilities need further examination. This study integrates tree-based methods and cellular automata to simulate multiple LUC processes in the Greater Tokyo Area. We examine the predictive capability of 4 tree-based models – bagged trees, random forests, extremely randomised trees (ERT) and bagged gradient boosting decision trees (bagged GBDT) – on transition probability prediction for 18 land use transitions derived from 8 land use types. We compare the predictive power of a tree-based model with multi-layer perceptron (MLP) and among themselves. The results show that tree-based models generally perform better than MLP, and ERT significantly outperforms the three other tree-based models. The outstanding predictive performance of ERT demonstrates the advantages of introducing bagging ensemble and a high degree of randomisation into transition probability modelling. In addition, through variable importance evaluation, we found the strongest explanatory powers of neighbourhood characteristics for all land use transitions; however, the size of the impacts depends on the neighbourhood land use type and the neighbourhood size. Furthermore, socio-economic and policy factors play important roles in transitions ending with high-rise buildings and transitions related to industrial areas.  相似文献   

13.
14.
Multiple sinkhole susceptibility models have been generated in three study areas of the Ebro Valley evaporite karst (NE Spain) applying different methods (nearest neighbour distance, sinkhole density, heuristic scoring system and probabilistic analysis) for each sinkhole type separately (cover collapse sinkholes, cover and bedrock collapse sinkholes and cover and bedrock sagging sinkholes). The quantitative and independent evaluation of the predictive capability of the models reveals that: (1) The most reliable susceptibility models are those derived from the nearest neighbour distance and sinkhole density. These models can be generated in a simple and rapid way from detailed geomorphological maps. (2) The reliability of the nearest neighbour distance and density models is conditioned by the degree of clustering of the sinkholes. Consequently, the karst areas in which sinkholes show a higher clustering are a priori more favourable for predicting new occurrences. (3) The predictive capability of the best models obtained in this research is significantly higher (12.5–82.5%) than that of the heuristic sinkhole susceptibility model incorporated into the General Urban Plan for the municipality of Zaragoza. Although the probabilistic approach provides lower quality results than the methods based on sinkhole proximity and density, it helps to identify the most significant factors and select the most effective mitigation strategies and may be applied to model susceptibility in different future scenarios.  相似文献   

15.
本文以地球系统科学、地球信息科学和现代地图学的理论、方法和技术为指导,系统地研究和建立了面向地理特征的制图综合的指标体系和知识法则,并进行了实例应用分析。研究方法是采用地学分析和归纳、地图分析、专家咨询、GIS和遥感空间分析等方法来总结、提炼和建立制图综合的指标体系和知识法则。指标类型包括数据指标、文字说明指标、图形指标3种,共分数据库概括(即语义概括)和地图可视化概括(即图形概括)两类。知识法则在横向由几何性知识、结构性知识、过程性知识构成,在纵向按照地物的地理特征描述性知识、操作项选择知识规则、算法选择知识规则、面向专门地理要素和制图综合知识规则、面向区域制图综合的知识规则等过程和方面来组织和分类。在知识库中则按照概括条件、概括行为和概括要求(或概括水平)3个变量来组织,形成三维坐标关系的知识法则内部体系。在实例分析中阐述了珠江三角洲经济区的交通网络图的制图综合过程和结果。  相似文献   

16.
The standard deviation of prediction errors(SDEP)is used to evaluate and compare the predictive abilityof some regression models,namely MLR,ACE and linear and non-linear PLS,the last being the bestone.The parameter is determined by a cross-validation approach as an average of several runs obtainedon forming groups in a random way.The variation in SDEP with the number of latent variables in PLSis also discussed.  相似文献   

17.
Scaled chrysophytes in the surface sediments of 58 soft-water northern New England lakes were analyzed to assess their usefulness for inferring pH. The distributions of many taxa are correlated with lakewater pH and associated variables. Canonical correspondence analysis (CCA) and clustering grouped chrysophyte taxa according to their distributions along the pH gradient. For example, Chrysodidymus synuroideus, Mallomonas hindonii, and M. hamata commonly occur in acidic waters (pH<5.5), whereas M. caudata and M. pseudocoronata are common in circumneutral to alkaline waters. Of the five predictive models developed to infer pH, CCA based calibration had the lowest standard error (0.35 pH units). A CCA based predictive model was also developed to infer total alkalinity. The study provides strong evidence that, in the absence of past measured pH data, stratigraphic studies of sedimentary chrysophyte scales will provide accurate reconstructions of pH in northern New England lakes.This is the sixth of a series of papers to be published by this journal which is a contribution of the Paleoecological Investigation of Recent Lake Acidification (PIRLA) project. Drs. D.F. Charles and D.R. Whitehead are guest editors for this series.  相似文献   

18.
研究影响不同土壤属性空间分布的协同环境因子及其作用尺度,对于理解不同土壤属性的成土发展、土壤推测制图及针对多种土壤属性的空间采样设计具有重要意义。针对多种土壤属性,探索不同土壤属性的重要相关环境因子及其作用尺度,并就不同环境因子及其尺度的不同对土壤属性推测制图的影响展开研究。以黑龙江省鹤山农场为研究区,以表层砂粒、粉粒、黏粒、有机质含量和土壤厚度5种土壤属性为研究对象,根据计算邻域窗口大小的不同,生成173个不同尺度的地形因子,对单尺度地形因子和多尺度地形因子进行重要性排序,并根据重要性排序构建单尺度环境因子集1和多尺度环境因子集2,和基于专家知识选出的基准环境因子集3进行制图精度的对比。结果表明:当单尺度地形因子进行重要性排序选择时,所选出的5种土壤属性的重要相关环境因子与基准环境因子集3明显不同。当多尺度环境因子参与时,尽管对各土壤属性的作用尺度不同,各土壤属性排名靠前的因子绝大多数是基准环境因子。砂粒和粉粒的重要相关因子及作用尺度相当,但与黏粒的重要相关因子和作用尺度差别很大,有机质和土壤厚度的重要相关因子十分相似。环境因子集2较基准环境因子集3的制图精度显著提高,RMSE均值提高百分比为7.8%~21.3%,较环境因子集1的制图RMSE均值提高百分比为8.7%~16.5%。因此,针对不同的土壤属性进行制图或采样设计时,需充分考虑其环境因子和作用尺度的不同,针对基准环境因子选择适宜的尺度较选择不同的相关环境因子更重要。  相似文献   

19.
Predictive pH models developed using scaled chrysophytes (Synurophyceae, Chrysophyceae) have thus far been based on the relative abundance of scales and not whole cells. This paper examines the effects of transforming scale to cell numbers on the predictive abilities of pH inference models, and the effects of logarithmic and square-root transformations of the species data on the predictive abilities of pH inference models.Very similar pH inference models were developed based on either the relative abundance of scales or cells. Thus, in this data-set, there appears to be no statistical advantage in transforming raw scale counts to cell counts prior to calculating the relative abundances. However, if one wishes to compare paleochrysophyte populations to actual long-term limnological chrysophyte collections, a scale-to-cell transformation would be desirable. Logarithmic and square-root transformations of the species data improve the pH inference models. These transformations increase the effective number of occurrences of chrysophyte taxa when compared to the untransformed scale and cell pH models. The logarithmic and square-root transformations improve the pH inference models because the dominant taxa, which are often pH generalists, are down-weighted in comparison to the more pH specialist, sub-dominant taxa. We suggest researchers use either a logarithmic or square-root transformation on chrysophyte scale data to improve quantitative reconstructions of lakewater pH and possibly other variables.  相似文献   

20.
A TEST OF SIGNIFICANCE FOR PARTIAL LEAST SQUARES REGRESSION   总被引:1,自引:0,他引:1  
Partial least squares (PLS) regression is a commonly used statistical technique for performingmultivariate calibration, especially in situations where there are more variables than samples. Choosingthe number of factors to include in a model is a decision that all users of PLS must make, but iscomplicated by the large number of empirical tests available. In most instances predictive ability is themost desired property of a PLS model and so interest has centred on making this choice based on aninternal validation process. A popular approach is the calculation of a cross-validated r~2 to gauge howmuch variance in the dependent variable can be explained from leave-one-out predictions. Using MonteCarlo simulations for different sizes of data set, the influence of chance effects on the cross-validationprocess is investigated. The results are presented as tables of critical values which are compared againstthe values of cross-validated r~2 obtained from the user's own data set. This gives a formal test forpredictive ability of a PLS model with a given number of dimensions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号