首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
Isometric Logratio Transformations for Compositional Data Analysis   总被引:37,自引:0,他引:37  
Geometry in the simplex has been developed in the last 15 years mainly based on the contributions due to J. Aitchison. The main goal was to develop analytical tools for the statistical analysis of compositional data. Our present aim is to get a further insight into some aspects of this geometry in order to clarify the way for more complex statistical approaches. This is done by way of orthonormal bases, which allow for a straightforward handling of geometric elements in the simplex. The transformation into real coordinates preserves all metric properties and is thus called isometric logratio transformation (ilr). An important result is the decomposition of the simplex, as a vector space, into orthogonal subspaces associated with nonoverlapping subcompositions. This gives the key to join compositions with different parts into a single composition by using a balancing element. The relationship between ilr transformations and the centered-logratio (clr) and additive-logratio (alr) transformations is also studied. Exponential growth or decay of mass is used to illustrate compositional linear processes, parallelism and orthogonality in the simplex.  相似文献   

2.
The statistical analysis of compositional data is based on determining an appropriate transformation from the simplex to real space. Possible transfonnations and outliers strongly interact: parameters of transformations may be influenced particularly by outliers, and the result of goodness-of-fit tests will reflect their presence. Thus, the identification of outliers in compositional datasets and the selection of an appropriate transformation of the same data, are problems that cannot be separated. A robust method for outlier detection together with the likelihood of transformed data is presented as a first approach to solve those problems when the additive-logratio and multivariate Box-Cox transformations are used. Three examples illustrate the proposed methodology.  相似文献   

3.
The high-dimensionality of many compositional data sets has caused geologists to look for insights into the observed patterns of variability through two dimension-reducing procedures: (i)the selection of a few subcompositions for particular study, and (ii)principal component analysis. After a brief critical review of the unsatisfactory state of current statistical methodology for these two procedures, this paper takes as a starting point for the resolution of persisting difficulties a recent approach to principal component analysis through a new definition of the covariance structure of a composition. This approach is first applied for expository purposes to a small illustrative compositional data set and then to a number of larger published geochemical data sets. The new approach then leads naturally to a method of measuring the extent to which a subcomposition retains the pattern of variability of the whole composition and so provides a criterion for the selection of suitable subcompositions. Such a selection process is illustrated by application to geochemical data sets.  相似文献   

4.
A correction model for conditional bias in selective mining operations   总被引:1,自引:0,他引:1  
A nonlinear correction functionK(Z*) is proposed to transform any initial linear grade estimatorZ* into a conditional unbiased estimatorZ**=K(Z*) with reduced conditional estimation variance. Such a corrected estimator allows more accurate prediction of ore reserves at any level of selection performed during the mine lifetime. The correction is based upon an analytical or isofactorial representation of a bivariate distribution model of true gradeZ and its estimatorZ*. This correction model allows derivation of conditional estimation variances for both estimatorsZ* andZ** and provides a solution to the problem of change of support. A case study is presented and performance of the proposed correction model is evaluated in terms of actual conditional bias and mean squared errors. Results obtained stress the practical importance of the correction model in selective mining operations.  相似文献   

5.
Commonly, geological studies compare mean values of two or more compositional data suites in order to determine if, how, and by how much they differ. Simple approaches for evaluating and statistically testing differences in mean values for open data fail for compositional (closed) data. A new parameter, an f-value, therefore has been developed, which correctly quantifies the differences among compositional mean values and allows testing those differences for statistical significance. In general, this parameter quantifies only therelative factor by which compositional variables differ across data suites; however for situations where, arguably, at least one component has neither increased nor decreased, anabsolute f-value can be computed. In situations where the compositional variables have undergone many perturbations, arguments based upon thef-values and the central limit theorem indicate that logratios of compositional variables should be normally distributed.  相似文献   

6.
In an open pit mine, the selection of blocks for mill feed necessitates the use of a conditionally unbiased estimator not only to maximize profits, but also to predict precisely the grades at the mill. Estimation of blocks usually is done using a series of blasthole assays on a regular grid. In many instances, the blasthole grades show a lognormal-like distribution. This study examines an estimator based on the hypothesis of bilognormality between the true block grade and the estimate obtained using the blastholes. The properties of the estimator are established and the estimator is proven to be conditionally unbiased. It is almost as precise as the lognormal kriging estimator when the points are multilognormal. However, it is more precise than lognormal krigings when only univariate lognormality is present or when the distribution is not exactly lognormal. The estimator also is shown to be robust to errors in the specifications of the variogram model or of the expectation of Z. Contrary to lognormal krigings, the estimator does only a slight correction to the original estimate obtained using the blastholes assays.  相似文献   

7.
The geometric average is often used to estimate the effective (large-scale) permeability from smaller-scale samples. In doing so, one assumes that the geometric average is a good estimator of the geometric mean. Problems with this estimator arise, however, when one or more of the samples has a very low value. The estimate obtained becomes very sensitive to the small values in the sample set, while the true effective permeability may be only weakly dependent on these small values. Several alternative methods of estimating the geometric mean are suggested. In particular, a more robust estimator of the geometric mean, the jth Winsorized mean, is proposed and several of its properties are compared with those of the geometric average.  相似文献   

8.
A variety of approaches to the testing of distributional forms for compositional data has appeared in the literature, all based on logratio or Box–Cox transformation techniques and to a degree dependent on the divisor chosen in the formation of ratios for these transformations. This paper, recognizing the special algebraic–geometric structure of the standard simplex sample space for compositional problems, the use of the fundamental simplicial singular value decomposition, and an associated power-perturbation characterization of compositional variability, attempts to provide a definitive approach to such distributional testing problems. Our main consideration is the characterization and testing of additive logistic–normal form, but we also indicate possible applications to logistic skew normal forms once a full range of multivariate tests emerges. The testing strategy is illustrated with both simulated data and applications to some real geological compositional data sets.  相似文献   

9.
目前,测流不确定度通过误差试验或通过经验数值来确定,但这些方式存在着工作量大或不确定估计不足等局限性。为解决此问题,对基于实测数据和统计理论的插值方差估计法在不同测流条件下进行了验证,选取白河、襄阳和沙洋3个流量站进行了实测数据的不确定度分析,同时对白河站进行了Monte Carlo试验,比较插值方差估计法得到的不确定度与真实误差的差异。结果表明,插值方差估计法能较好地反映水位变化的影响,插值方差估计法所得到的不确定度与真实测流误差的相关系数达0.64,与断面水位变化的Spearman相关系数达0.79,高、中水位情况下插值方差估计法的不确定度估计结果较为合理,低水位情况下偏高。  相似文献   

10.
Operator error in petrographic point-count analysis introduces bias into the estimates of proportion in a thin section. A correction for this bias, leading to an unbiased estimator of the true proportion in that thin section, is here proposed. Operator error also affects the confidence interval, and in this situation, too, an adjustment is possible. The approach proposed requires that the probabilities associated with operator error, categorized into A-type and B-type errors, are known or assumed. The A-type operator error tends to underestimate the true proportion in a thin section, whereas the B-type operator error tends to overestimate it.  相似文献   

11.
Spatial declustering weights   总被引:1,自引:0,他引:1  
Because of autocorrelation and spatial clustering, all data within a given dataset have not the same statistical weight for estimation of global statistics such mean, variance, or quantiles of the population distribution. A measure of redundancy (or nonredundancy) of any given regionalized random variable Z(uα)within any given set (of size N) of random variables is proposed. It is defined as the ratio of the determinant of the N X Ncorrelation matrix to the determinant of the (N - 1) X (N - 1)correlation matrix excluding random variable Z(uα).This ratio measures the increase in redundancy when adding the random variable Z(uα)to the (N - 1 )remainder. It can be used as declustering weight for any outcome (datum) z(uα). When the redundancy matrix is a kriging covariance matrix, the proposed ratio is the crossvalidation simple kriging variance. The covariance of the uniform scores of the clustered data is proposed as a redundancy measure robust with respect to data clustering.  相似文献   

12.
Soil erosion is one of most widespread process of degradation. The erodibility of a soil is a measure of its susceptibility to erosion and depends on many soil properties. Soil erodibility factor varies greatly over space and is commonly estimated using the revised universal soil loss equation. Neglecting information about estimation uncertainty may lead to improper decision-making. One geostatistical approach to spatial analysis is sequential Gaussian simulation, which draws alternative, equally probable, joint realizations of a regionalised variable. Differences between the realizations provide a measure of spatial uncertainty and allow us to carry out an error analysis. The objective of this paper was to assess the model output error of soil erodibility resulting from the uncertainties in the input attributes (texture and organic matter). The study area covers about 30 km2 (Calabria, southern Italy). Topsoil samples were collected at 175 locations within the study area in 2006 and the main chemical and physical soil properties were determined. As soil textural size fractions are compositional data, the additive-logratio (alr) transformation was used to remove the non-negativity and constant-sum constraints on compositional variables. A Monte Carlo analysis was performed, which consisted of drawing a large number (500) of identically distributed input attributes from the multivariable joint probability distribution function. We incorporated spatial cross-correlation information through joint sequential Gaussian simulation, because model inputs were spatially correlated. The erodibility model was then estimated for each set of the 500 joint realisations of the input variables and the ensemble of the model outputs was used to infer the erodibility probability distribution function. This approach has also allowed for delineating the areas characterised by greater uncertainty and then to suggest efficient supplementary sampling strategies for further improving the precision of K value predictions.  相似文献   

13.
14.
When estimating the mean value of a variable, or the total amount of a resource, within a specified region it is desirable to report an estimated standard error for the resulting estimate. If the sample sites are selected according to a probability sampling design, it usually is possible to construct an appropriate design-based standard error estimate. One exception is systematic sampling for which no such standard error estimator exists. However, a slight modification of systematic sampling, termed 2-step tessellation stratified (2TS) sampling, does permit the estimation of design-based standard errors. This paper develops a design-based standard error estimator for 2TS sampling. It is shown that the Taylor series approximation to the variance of the sample mean under 2TS sampling may be expressed in terms of either a deterministic variogram or a deterministic covariance function. Variance estimation then can be approached through the estimation of a variogram or a covariance function. The resulting standard error estimators are compared to some more traditional variance estimators through a simulation study. The simulation results show that estimators based on the new approach may perform better than traditional variance estimators.  相似文献   

15.
Developments in the statistical analysis of compositional data over the last two decades have made possible a much deeper exploration of the nature of variability and the possible processes associated with compositional data sets from many disciplines. In this paper, we concentrate on geochemical data. First, we explain how hypotheses of compositional variability may be formulated within the natural sample space, the unit simplex, including useful hypotheses of sub-compositional discrimination and specific perturbational change. Then we develop through standard methodology, such as generalised likelihood ratio tests, statistical tools to allow the systematic investigation of a lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require special construction. We comment on the use of graphical methods in compositional data analysis and on the ordination of specimens. The recent development of the concept of compositional processes is then explained, together with the necessary tools for a staying-in-the-simplex approach, such as the singular value decomposition of a compositional data set. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major oxide and trace element compositions of metamorphosed limestones from the Grampian Highlands of Scotland. Finally, we discuss some unresolved problems in the statistical analysis of compositional processes.  相似文献   

16.
Estimation of regionalized compositions: A comparison of three methods   总被引:1,自引:0,他引:1  
A regionalized composition is a random vector function whose components are positive and sum to a constant at every point of the sampling region. Consequently, the components of a regionalized composition are necessarily spatially correlated. This spatial dependence—induced by the constant sum constraint—is a spurious spatial correlation and may lead to misinterpretations of statistical analyses. Furthermore, the cross-covariance matrices of the regionalized composition are singular, as is the coefficient matrix of the cokriging system of equations. Three methods of performing estimation or prediction of a regionalized composition at unsampled points are discussed: (1) the direct approach of estimating each variable separately; (2) the basis method, which is applicable only when a random function is available that can he regarded as the size of the regionalized composition under study; (3) the logratio approach, using the additive-log-ratio transformation proposed by J. Aitchison, which allows statistical analysis of compositional data. We present a brief theoretical review of these three methods and compare them using compositional data from the Lyons West Oil Field in Kansas (USA). It is shown that, although there are no important numerical differences, the direct approach leads to invalid results, whereas the basis method and the additive-log-ratio approach are comparable.  相似文献   

17.
Geologists may want to classify compositional data and express the classification as a map. Regionalized classification is a tool that can be used for this purpose, but it incorporates discriminant analysis, which requires the computation and inversion of a covariance matrix. Covariance matrices of compositional data always will be singular (noninvertible) because of the unit-sum constraint. Fortunately, discriminant analyses can be calculated using a pseudo-inverse of the singular covariance matrix; this is done automatically by some statistical packages such as SAS. Granulometric data from the Darss Sill region of the Baltic Sea is used to explore how the pseudo-inversion procedure influences discriminant analysis results, comparing the algorithm used by SAS to the more conventional Moore–Penrose algorithm. Logratio transforms have been recommended to overcome problems associated with analysis of compositional data, including singularity. A regionalized classification of the Darss Sill data after logratio transformation is different only slightly from one based on raw granulometric data, suggesting that closure problems do not influence severely regionalized classification of compositional data.  相似文献   

18.
Kriging as an interpolation method, uses as predictor a linear function of the observations, minimizing the mean squared prediction error or estimation variance. Under multivariate normality assumptions, the given predictor is the best unbiased predictor, and will be vulnerable to outliers. To overcome this problem, a robust weighted estimator of the drift model coefficients is proposed, where unequally spaced data may be weighted through the tile areas of the Dirichlet tessellation.  相似文献   

19.
Soil properties are indispensable input parameters in geotechnical design and analysis. In engineering practice, particularly for projects with relatively small or medium sizes, soil properties are often not measured directly, but estimated from geotechnical design charts using results of some commonly used laboratory or in situ tests. For example, effective friction angle ?′ of soil is frequently estimated using standard penetration test (SPT) N values and design charts relating SPT N values to ?′. Note that directly measured ?′ data are generally not available when (and probably why) the use of design charts is needed. Because design charts are usually developed from past observation data, on either empirical or semi‐theoretical basis, uncertainty is unavoidably involved in the design charts. This situation leads to two important questions in engineering practice: 1 how good or reliable are the soil properties estimated in a specific site when using the design charts? (or how to measure the performance of the design charts in a specific site?); and 2 how to incorporate rationally the model uncertainty when estimating soil properties using the design charts? This paper aims to address these two questions by developing a Bayesian statistical approach. In this paper, the second question is firstly addressed (i.e., soil properties are probabilistically characterized by rationally incorporating the model uncertainty in the design chart). Then, based on the characterization results obtained, an index is proposed to evaluate the site‐specific performance of design charts (i.e., to address the first question). Equations are derived for the proposed approach, and the proposed approach is illustrated using both real and simulated SPT data. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

20.
When concerned with spatial data, it is not unusual to observe a nonstationarity of the mean. This nonstationarity may be modeled through linear models and the fitting of variograms or covariance functions performed on residuals. Although it usually is accepted by authors that a bias is present if residuals are used, its importance is rarely assessed. In this paper, an expression of the variogram and the covariance function is developed to determine the expected bias. It is shown that the magnitude of the bias depends on the sampling configuration, the importance of the dependence between observations, the number of parameters used to model the mean, and the number of data. The applications of the expression are twofold. The first one is to evaluate a priori the importance of the bias which is expected when a residuals-based variogram model is used for a given configuration and a hypothetical data dependence. The second one is to extend the weighted least-squares method to fit the variogram and to obtain an unbiased estimate of the variogram. Two case studies show that the bias can be negligible or larger than 20%. The residual-based sample variogram underestimates the total variance of the process but the nugget variance may be overestimated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号