首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Isometric Logratio Transformations for Compositional Data Analysis   总被引:37,自引:0,他引:37  
Geometry in the simplex has been developed in the last 15 years mainly based on the contributions due to J. Aitchison. The main goal was to develop analytical tools for the statistical analysis of compositional data. Our present aim is to get a further insight into some aspects of this geometry in order to clarify the way for more complex statistical approaches. This is done by way of orthonormal bases, which allow for a straightforward handling of geometric elements in the simplex. The transformation into real coordinates preserves all metric properties and is thus called isometric logratio transformation (ilr). An important result is the decomposition of the simplex, as a vector space, into orthogonal subspaces associated with nonoverlapping subcompositions. This gives the key to join compositions with different parts into a single composition by using a balancing element. The relationship between ilr transformations and the centered-logratio (clr) and additive-logratio (alr) transformations is also studied. Exponential growth or decay of mass is used to illustrate compositional linear processes, parallelism and orthogonality in the simplex.  相似文献   

2.
A variety of approaches to the testing of distributional forms for compositional data has appeared in the literature, all based on logratio or Box–Cox transformation techniques and to a degree dependent on the divisor chosen in the formation of ratios for these transformations. This paper, recognizing the special algebraic–geometric structure of the standard simplex sample space for compositional problems, the use of the fundamental simplicial singular value decomposition, and an associated power-perturbation characterization of compositional variability, attempts to provide a definitive approach to such distributional testing problems. Our main consideration is the characterization and testing of additive logistic–normal form, but we also indicate possible applications to logistic skew normal forms once a full range of multivariate tests emerges. The testing strategy is illustrated with both simulated data and applications to some real geological compositional data sets.  相似文献   

3.
Spurious Clusters in Granulometric Data Caused by Logratio Transformation   总被引:1,自引:0,他引:1  
The logratio transformation aims to eliminate spurious correlations between components of compositional data. When applying this method to granulometric data, there arise numerical problems with zero (empty) components. In this paper, the method of logratio transformation with zero replacement is examined using one natural and two simulated granulometric data sets. The results show that this method generates spurious clusters, and thus it is not appropriate for the investigation of grain-size data in particular, and of compositional data with zero components in general.  相似文献   

4.
勘查地球化学找矿工作的重点在于正确解译地球化学数据,以便从冗杂的地质信息中精准提取与成矿有关的异常信息,指导找矿研究。然而,地球化学数据属于成分数据,具有闭合效应,只有对数据进行正确的预处理才能应用多元统计分析方法,还原元素真实的空间分布。本文在阿舍勒铜锌矿区外围南侧区域共收集1009件地表原生晕样品,对样品中的13种微量元素进行测试,并对原始数据、对数及ilr变换后的数据进行EDA分析,对比数据空间分布及内部结构特征。运用(稳健)主成分分析,结合成分数据双标图及第一主成分点位图,解译三类数据指示的元素组合与成矿信息之间的关联。随后运用多重分形滤波技术,对以ilr变换为基础的稳健主成分得分数据分解元素组合异常和背景分布特征。结果表明:①经过对数及ilr变换后的数据相比原始数据空间尺度更均匀,数据近似正态分布;②三类数据双标图表明,仅ilr变换后的数据消除了“闭合效应”,且其第一主成分元素分组揭示了研究区铜矿化与铅锌多金属矿化组合;以对数变换与ilr变换为基础的第一主成分点位图表明,后者主成分得分异常能够较好指示研究区地质找矿信息;③结合研究区地质找矿信息、元素组合异常及背景空间分布特征,最终圈定3个有利找矿靶区。  相似文献   

5.
Criteria to Compare Estimation Methods of Regionalized Compositions   总被引:1,自引:0,他引:1  
The additive logratio (alr) transformation has been used in several case studies to predict regionalized compositions using standard geostatistical estimation methods such as ordinary kriging and ordinary cokriging. It is a simple method that allows application to transformed data all the body of knowledge available for geostatistical analysis of coregionalizations without a constant sum constraint. To compare the performance of methods, it is customary to use a univariate crossvalidation approach based on the leaving-one-out technique to evaluate the performance for each attribute separately. For multivariate observations this approach is difficult to interpret in terms of overall performance. Therefore, we propose using appropriate distances in real space and in the simplex, to improve the crossvalidation approach and, going a step forward, to adapt the concept of stress from multidimensional scaling to obtain a global measure of performance for each method. The Lyons West oil field of Kansas is used to illustrate the impactof using different distances in the performance of ordinary kriging versus ordinary cokriging.  相似文献   

6.
The purpose of this study was to capture the structure of a geological process within a multivariate statistical framework by using geological data generated by that process and, where applicable, by associated processes. It is important to the practitioners of statistical analysis in geology to determine the degree to which the geological process can be captured and explained by multivariate analysis by using sample data (for example, chemical analyses) taken from the geological entity created by that process. The process chosen for study here is the creation of a coal deposit. In this study, the data are chemical analyses expressed in weight percentage and parts per million, and therefore are subject to the affects of the constant sum phenomenon. The data array is the chemical composition of the whole coal. This restriction on the data imposed by the constant sum phenomenon was removed by using the centered logratio (clr) transformation. The use of scatter plots and principal component biplots applied to the raw and centered logratio (clr) transformed data arrays affects the interpretation and comprehension of the geological process of coalification.  相似文献   

7.
Grain-size measurements are a type of compositional data and thus subject to closure effects and nonnormality. The logratio transform of Aitchison successfully resolves these problems in compositional data analysis. An application to modern sediment data from the northern part of the South China Sea demonstrates that logratio principal components analysis provides a clear separation of data which cannot be obtained by ordinary principal components analysis, and that cluster analysis using logratio principal components gives a much better classification of sediments than does cluster analysis using raw data. The delineation of sedimentary environments on the basis of a logratio classification of sediment samples provides a better understanding of hydrodynamic conditions on the shelf.  相似文献   

8.
The complexity of modern geochemical data sets is increasing in several aspects (number of available samples, number of elements measured, number of matrices analysed, geological-environmental variability covered, etc), hence it is becoming increasingly necessary to apply statistical methods to elucidate their structure. This paper presents an exploratory analysis of one such complex data set, the Tellus geochemical soil survey of Northern Ireland (NI). This exploratory analysis is based on one of the most fundamental exploratory tools, principal component analysis (PCA) and its graphical representation as a biplot, albeit in several variations: the set of elements included (only major oxides vs. all observed elements), the prior transformation applied to the data (none, a standardization or a logratio transformation) and the way the covariance matrix between components is estimated (classical estimation vs. robust estimation). Results show that a log-ratio PCA (robust or classical) of all available elements is the most powerful exploratory setting, providing the following insights: the first two processes controlling the whole geochemical variation in NI soils are peat coverage and a contrast between “mafic” and “felsic” background lithologies; peat covered areas are detected as outliers by a robust analysis, and can be then filtered out if required for further modelling; and peat coverage intensity can be quantified with the %Br in the subcomposition (Br, Rb, Ni).  相似文献   

9.
The statistical analysis of compositional data is based on determining an appropriate transformation from the simplex to real space. Possible transfonnations and outliers strongly interact: parameters of transformations may be influenced particularly by outliers, and the result of goodness-of-fit tests will reflect their presence. Thus, the identification of outliers in compositional datasets and the selection of an appropriate transformation of the same data, are problems that cannot be separated. A robust method for outlier detection together with the likelihood of transformed data is presented as a first approach to solve those problems when the additive-logratio and multivariate Box-Cox transformations are used. Three examples illustrate the proposed methodology.  相似文献   

10.
Geologists may want to classify compositional data and express the classification as a map. Regionalized classification is a tool that can be used for this purpose, but it incorporates discriminant analysis, which requires the computation and inversion of a covariance matrix. Covariance matrices of compositional data always will be singular (noninvertible) because of the unit-sum constraint. Fortunately, discriminant analyses can be calculated using a pseudo-inverse of the singular covariance matrix; this is done automatically by some statistical packages such as SAS. Granulometric data from the Darss Sill region of the Baltic Sea is used to explore how the pseudo-inversion procedure influences discriminant analysis results, comparing the algorithm used by SAS to the more conventional Moore–Penrose algorithm. Logratio transforms have been recommended to overcome problems associated with analysis of compositional data, including singularity. A regionalized classification of the Darss Sill data after logratio transformation is different only slightly from one based on raw granulometric data, suggesting that closure problems do not influence severely regionalized classification of compositional data.  相似文献   

11.
The analysis and interpretation of compositional data, such as major oxide compositions of rocks, has been traditionally plagued by the so-called constant-sum or closure problem. Particular difficulties have been the lack of a satisfactory, interpretable covariance structure and of rich, tractable, parametric classes of distributions on the simplex sample space. Consideration of logistic and logratio transformations between the simplex and Euclidan space has allowed the introduction of new concepts of covariance structure and of classes of logistic-normal distributions which have now opened up a substantial and meaningful array of statistical methodology for compositional data. From the motivation of a wide variety of practical geological problems we examine the range of possibilities with this new approach to the constant-sum problem.  相似文献   

12.
Hydraulic exponents and unit hydraulic exponents are unit-sum constrained, which requires that they be analyzed by statistical methods designed for compositional data. Though uncertainties remain regarding selection of the best constraining operation and method of handling departures from the unit-sum constraint, neither category of uncertainty should be an impediment to the selection of the appropriate statistical methodology. In a small sample study, the hydraulic geometry of different types of streams were compared: (1) semi-arid: perennial vs. ephemeral; (2) tropical: Puerto Rico vs. West Malaysia; and (3) semi-arid vs. tropical (by pooling the previous data sets). All three comparisons revealed statistically significant differences in either logratio mean vectorsor logratio covariance matrices but not both. All six categories of data had logistic normal distributions. Because the derivatives at a given discharge of curvilinear hydraulic geometry relationships and hydraulic exponents on either side of the breakpoints of piecewise linear relationships are unit-sum constrained, they also can be studied by compositional methods. However, the compositional approach is limited in cases where distributions have large departures from logistic normality and for streams that have negative hydraulic exponents.  相似文献   

13.

Problems with compositional data, like spurious correlation and negative bias, are well known in the Geosciences. Not so well known is the fact that the same problems appear when dealing with regionalized compositions. Here, these problems are illustrated, and a solution, based on the principle of working in coordinates using orthonormal logratio representations, is presented. This approach offers a tool for standard geostatistical studies. One of the advantages the method has is that it allows the usual inconsistencies with indicator kriging to be overcome through simplicial indicator kriging. A general way of modelling crossvariograms of coordinates, based on the matrix valued variation variogram, is discussed. In summary, the main aspects related to the modelling and analysis of regionalized compositions have had satisfactory solutions found for them. The proposed methodology is illustrated with public data from a survey concerning arsenic contamination in underground water in Bangladesh.

  相似文献   

14.
John C. Butler 《Lithos》1979,12(1):33-39
Chemical analyses of igneous rocks have been expressed in a variety of units and many previous workers have expressed a preference for one transformation form or another (oxide weight percentages, molecular amounts, grams per 100 cc, etc.). Transformations which operate on the columns of a data matrix (containing the measured chemical constituents) can be described as simple linear transformations which change the means and variances of the columns but not the coefficients of variation of the columns or the correlation coefficients between columns. Such transformations include changing from oxide weight percentage to molecular amounts or normalizing to some constant value. If the investigator applies a numerical or graphical method of analysis to the original and column transformed forms of the same data set any differences must be attributed to the transformation itself. If the column transformed data are converted to percentage form (such as forming molecular percentages from molecular amounts) all of the summary statistics are likely to change.Transformations which operate on a row of the data matrix (such as multiplication by sample density to convert oxide weight percentages to grams/100 cc) result in changes in the summary statistics of each column and the correlations between all pairs of columns. Equations can be written which express the properties of the columns of the row transformed data matrix in terms of the properties of the non-transformed (parent) data matrix.An investigator may have a preference as to the units in which to express chemical variations(s) of igneous rocks but such preference must be based on petrogenetic criteria and not on a comparison of the statistics of the transformed and non-transformed forms of the same data set.  相似文献   

15.
Estimation of regionalized compositions: A comparison of three methods   总被引:1,自引:0,他引:1  
A regionalized composition is a random vector function whose components are positive and sum to a constant at every point of the sampling region. Consequently, the components of a regionalized composition are necessarily spatially correlated. This spatial dependence—induced by the constant sum constraint—is a spurious spatial correlation and may lead to misinterpretations of statistical analyses. Furthermore, the cross-covariance matrices of the regionalized composition are singular, as is the coefficient matrix of the cokriging system of equations. Three methods of performing estimation or prediction of a regionalized composition at unsampled points are discussed: (1) the direct approach of estimating each variable separately; (2) the basis method, which is applicable only when a random function is available that can he regarded as the size of the regionalized composition under study; (3) the logratio approach, using the additive-log-ratio transformation proposed by J. Aitchison, which allows statistical analysis of compositional data. We present a brief theoretical review of these three methods and compare them using compositional data from the Lyons West Oil Field in Kansas (USA). It is shown that, although there are no important numerical differences, the direct approach leads to invalid results, whereas the basis method and the additive-log-ratio approach are comparable.  相似文献   

16.
Common Principal Component Analysis is a generalization of standard principal components to several groups under the rigid mathematical assumption of equality of all latent vectors across groups (i.e., principal component directions), whereas the latent roots are allowed to vary between groups (differing inflations of dispersion ellipsoids). In practice, data that fulfill these strict requirements are relatively rare. Examples from palaeontology are used to illustrate the principles. Compositional data can be made to fit the Common Principal Component (CPC) model by the appropriate logratio covariance matrix.  相似文献   

17.
Logratio Analysis and Compositional Distance   总被引:10,自引:0,他引:10  
The concept of distance between two compositions is important in the statistical analysis of compositional data, particularly in such activities as cluster analysis and multidimensional scaling. This paper exposes the fallacies in a recent criticism of logratio-based distance measures—in particular, the misstatements that logratio methods destroy distance structures and are denominator dependent. Emphasis is on ensuring that compositional data analysis involving distance concepts satisfies certain logically necessary invariance conditions. Logratio analysis and its associated distance measures satisfy these conditions.  相似文献   

18.
An analysis of statistical expected values for transformations is performed in this study to quantify the effect of heterogeneity on spatial geological modeling and evaluations. Algebraic transformations are frequently applied to data from logging to allow for the modeling of geological properties. Transformations may be powers, products, and exponential operations which are commonly used in well-known relations (e.g., porosity-permeability transforms). The results of this study show that correct computations must account for residual transformation terms which arise due to lack of independence among heterogeneous geological properties. In the case of an exponential porosity-permeability transform, the values may be positive. This proves that a simple exponential model back-transformed from linear regression underestimates permeability. In the case of transformations involving two or more properties, residual terms may represent the contribution of heterogeneous components which occur when properties vary together, regardless of a pair-wise linear independence. A consequence of power- and product-transform models is that regression equationswithin those transformations need corrections via residual cumulants. A generalization of this result isthat transformations of multivariate spatial attributes require multiple-point random variable relations. This analysis provides practical solutions leading to a methodology for nonlinear modeling using correct back transformations in geology.  相似文献   

19.
Compositional data, consisting of vectors of proportions summing to unity such as the geochemical compositions of rocks, have proved difficult to analyze. Recently, the introduction of logistic and logratio transformations between the d-dimensional simplex and Euclidean space has allowed the use of familiar multivariate methods. The problem of how to model and analyze measurement errors in such data is approached through the concept of a perturbation of a composition. Such modeling allows investigation of the role of rescaling, quantification of measurement error, analysis of observor error, and assessment of the effect of measurement error on inferences.  相似文献   

20.
A Parametric Approach for Dealing with Compositional Rounded Zeros   总被引:2,自引:0,他引:2  
In this work, a parametric approach for replacing data below the detection limit, also known as rounded zeros, in compositional data sets is proposed. Compositional rounded zeros correspond to small proportions of some whole that cannot be reliably detected by the analytical instruments under given operating conditions. This kind of zeros appear frequently in the data collection process in geosciences. They must be treated in an adequate way before some multivariate analysis can be applied. Our procedure results from a modification of the Expectation-Maximization (EM) algorithm and is based on the additive log-ratio transformation. Its coherence with the nature of compositional data and with basic operations in the simplex sample space is checked. Using real data sets, we find that this approach improves other parametric and non-parametric techniques for compositional rounded zeros.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号