首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 984 毫秒
1.
Geologists may want to classify compositional data and express the classification as a map. Regionalized classification is a tool that can be used for this purpose, but it incorporates discriminant analysis, which requires the computation and inversion of a covariance matrix. Covariance matrices of compositional data always will be singular (noninvertible) because of the unit-sum constraint. Fortunately, discriminant analyses can be calculated using a pseudo-inverse of the singular covariance matrix; this is done automatically by some statistical packages such as SAS. Granulometric data from the Darss Sill region of the Baltic Sea is used to explore how the pseudo-inversion procedure influences discriminant analysis results, comparing the algorithm used by SAS to the more conventional Moore–Penrose algorithm. Logratio transforms have been recommended to overcome problems associated with analysis of compositional data, including singularity. A regionalized classification of the Darss Sill data after logratio transformation is different only slightly from one based on raw granulometric data, suggesting that closure problems do not influence severely regionalized classification of compositional data.  相似文献   

2.
Grain-size measurements are a type of compositional data and thus subject to closure effects and nonnormality. The logratio transform of Aitchison successfully resolves these problems in compositional data analysis. An application to modern sediment data from the northern part of the South China Sea demonstrates that logratio principal components analysis provides a clear separation of data which cannot be obtained by ordinary principal components analysis, and that cluster analysis using logratio principal components gives a much better classification of sediments than does cluster analysis using raw data. The delineation of sedimentary environments on the basis of a logratio classification of sediment samples provides a better understanding of hydrodynamic conditions on the shelf.  相似文献   

3.
Estimation of regionalized compositions: A comparison of three methods   总被引:1,自引:0,他引:1  
A regionalized composition is a random vector function whose components are positive and sum to a constant at every point of the sampling region. Consequently, the components of a regionalized composition are necessarily spatially correlated. This spatial dependence—induced by the constant sum constraint—is a spurious spatial correlation and may lead to misinterpretations of statistical analyses. Furthermore, the cross-covariance matrices of the regionalized composition are singular, as is the coefficient matrix of the cokriging system of equations. Three methods of performing estimation or prediction of a regionalized composition at unsampled points are discussed: (1) the direct approach of estimating each variable separately; (2) the basis method, which is applicable only when a random function is available that can he regarded as the size of the regionalized composition under study; (3) the logratio approach, using the additive-log-ratio transformation proposed by J. Aitchison, which allows statistical analysis of compositional data. We present a brief theoretical review of these three methods and compare them using compositional data from the Lyons West Oil Field in Kansas (USA). It is shown that, although there are no important numerical differences, the direct approach leads to invalid results, whereas the basis method and the additive-log-ratio approach are comparable.  相似文献   

4.
Outlier Detection for Compositional Data Using Robust Methods   总被引:6,自引:2,他引:4  
Outlier detection based on the Mahalanobis distance (MD) requires an appropriate transformation in case of compositional data. For the family of logratio transformations (additive, centered and isometric logratio transformation) it is shown that the MDs based on classical estimates are invariant to these transformations, and that the MDs based on affine equivariant estimators of location and covariance are the same for additive and isometric logratio transformation. Moreover, for 3-dimensional compositions the data structure can be visualized by contour lines. In higher dimension the MDs of closed and opened data give an impression of the multivariate data behavior.  相似文献   

5.

Problems with compositional data, like spurious correlation and negative bias, are well known in the Geosciences. Not so well known is the fact that the same problems appear when dealing with regionalized compositions. Here, these problems are illustrated, and a solution, based on the principle of working in coordinates using orthonormal logratio representations, is presented. This approach offers a tool for standard geostatistical studies. One of the advantages the method has is that it allows the usual inconsistencies with indicator kriging to be overcome through simplicial indicator kriging. A general way of modelling crossvariograms of coordinates, based on the matrix valued variation variogram, is discussed. In summary, the main aspects related to the modelling and analysis of regionalized compositions have had satisfactory solutions found for them. The proposed methodology is illustrated with public data from a survey concerning arsenic contamination in underground water in Bangladesh.

  相似文献   

6.
A variety of approaches to the testing of distributional forms for compositional data has appeared in the literature, all based on logratio or Box–Cox transformation techniques and to a degree dependent on the divisor chosen in the formation of ratios for these transformations. This paper, recognizing the special algebraic–geometric structure of the standard simplex sample space for compositional problems, the use of the fundamental simplicial singular value decomposition, and an associated power-perturbation characterization of compositional variability, attempts to provide a definitive approach to such distributional testing problems. Our main consideration is the characterization and testing of additive logistic–normal form, but we also indicate possible applications to logistic skew normal forms once a full range of multivariate tests emerges. The testing strategy is illustrated with both simulated data and applications to some real geological compositional data sets.  相似文献   

7.
Logratios and Natural Laws in Compositional Data Analysis   总被引:1,自引:0,他引:1  
The impossibility of interpreting correlations of raw compositional components and associated statistical methods has been clearly demonstrated over the last four decades and alternative statistical methodology developed. Despite this a return to the traditional use of raw components has been advocated recently and alternative methodology such as logratio analysis strongly criticized. This paper exposes the fallacies in this recent advocacy and demonstrates the constructive role that logratio analysis can play in geological compositional problems, in particular in the investigation of natural laws and in subcompositional investigations.  相似文献   

8.
勘查地球化学找矿工作的重点在于正确解译地球化学数据,以便从冗杂的地质信息中精准提取与成矿有关的异常信息,指导找矿研究。然而,地球化学数据属于成分数据,具有闭合效应,只有对数据进行正确的预处理才能应用多元统计分析方法,还原元素真实的空间分布。本文在阿舍勒铜锌矿区外围南侧区域共收集1009件地表原生晕样品,对样品中的13种微量元素进行测试,并对原始数据、对数及ilr变换后的数据进行EDA分析,对比数据空间分布及内部结构特征。运用(稳健)主成分分析,结合成分数据双标图及第一主成分点位图,解译三类数据指示的元素组合与成矿信息之间的关联。随后运用多重分形滤波技术,对以ilr变换为基础的稳健主成分得分数据分解元素组合异常和背景分布特征。结果表明:①经过对数及ilr变换后的数据相比原始数据空间尺度更均匀,数据近似正态分布;②三类数据双标图表明,仅ilr变换后的数据消除了“闭合效应”,且其第一主成分元素分组揭示了研究区铜矿化与铅锌多金属矿化组合;以对数变换与ilr变换为基础的第一主成分点位图表明,后者主成分得分异常能够较好指示研究区地质找矿信息;③结合研究区地质找矿信息、元素组合异常及背景空间分布特征,最终圈定3个有利找矿靶区。  相似文献   

9.
On criteria for measures of compositional difference   总被引:4,自引:0,他引:4  
Simple perceptions about the nature of compositions lead through logical necessity to certain forms of analysis of compositional data. In this paper the consequences of essential requirements of scale, perturbation and permutation invariance, together with that of subcompositional dominance, are applied to the problem of characterizing change and measures of difference between two compositions. It will be shown that one strongly advocated scalar measure of difference fails these tests of logical necessity, and that one particular form of scalar measure of difference (the sum of the squares of all possible logratio differences in the components of the two compositions), although not unique, emerges as the simplest and most tractable satisfying the criteria.  相似文献   

10.
Isometric Logratio Transformations for Compositional Data Analysis   总被引:37,自引:0,他引:37  
Geometry in the simplex has been developed in the last 15 years mainly based on the contributions due to J. Aitchison. The main goal was to develop analytical tools for the statistical analysis of compositional data. Our present aim is to get a further insight into some aspects of this geometry in order to clarify the way for more complex statistical approaches. This is done by way of orthonormal bases, which allow for a straightforward handling of geometric elements in the simplex. The transformation into real coordinates preserves all metric properties and is thus called isometric logratio transformation (ilr). An important result is the decomposition of the simplex, as a vector space, into orthogonal subspaces associated with nonoverlapping subcompositions. This gives the key to join compositions with different parts into a single composition by using a balancing element. The relationship between ilr transformations and the centered-logratio (clr) and additive-logratio (alr) transformations is also studied. Exponential growth or decay of mass is used to illustrate compositional linear processes, parallelism and orthogonality in the simplex.  相似文献   

11.
Logratio Analysis and Compositional Distance   总被引:10,自引:0,他引:10  
The concept of distance between two compositions is important in the statistical analysis of compositional data, particularly in such activities as cluster analysis and multidimensional scaling. This paper exposes the fallacies in a recent criticism of logratio-based distance measures—in particular, the misstatements that logratio methods destroy distance structures and are denominator dependent. Emphasis is on ensuring that compositional data analysis involving distance concepts satisfies certain logically necessary invariance conditions. Logratio analysis and its associated distance measures satisfy these conditions.  相似文献   

12.
Mathematical Geosciences - Even though the logratio methodology provides a range of both generic, mostly exploratory, and purpose-built coordinate representations of compositional data, simple...  相似文献   

13.
BLU Estimators and Compositional Data   总被引:5,自引:0,他引:5  
One of the principal objections to the logratio approach for the statistical analysis of compositional data has been the absence of unbiasedness and minimum variance properties of some estimators: they seem not to be BLU estimator. Using a geometric approach, we introduce the concept of metric variance and of a compositional unbiased estimator, and we show that the closed geometric mean is a c-BLU estimator (compositional best linear unbiased estimator with respect to the geometry of the simplex) of the center of the distribution of a random composition. Thus, it satisfies analogous properties to the arithmetic mean as a BLU estimator of the expected value in real space. The geometric approach used gives real meaning to the concepts of measure of central tendency and measure of dispersion and opens up a new way of understanding the statistical analysis of compositional data.  相似文献   

14.
The analysis and interpretation of compositional data, such as major oxide compositions of rocks, has been traditionally plagued by the so-called constant-sum or closure problem. Particular difficulties have been the lack of a satisfactory, interpretable covariance structure and of rich, tractable, parametric classes of distributions on the simplex sample space. Consideration of logistic and logratio transformations between the simplex and Euclidan space has allowed the introduction of new concepts of covariance structure and of classes of logistic-normal distributions which have now opened up a substantial and meaningful array of statistical methodology for compositional data. From the motivation of a wide variety of practical geological problems we examine the range of possibilities with this new approach to the constant-sum problem.  相似文献   

15.
The study of hydrogeochemical data sets frequently calls for statistical dimension reducing techniques. It is well known that hydrochemical parameters are compositions and, for this type of data, the direct application of classical statistical methods based on the correlation matrix yield spurious results. But new results on compositional data analysis have identified the sampling space, the simplex, with an Euclidean space, a fact that allows us to define a simplicial factor analysis strategy, thus overcoming the problem. For illustration, we use samples from the Llobregat River and its tributaries (NE Spain). Three unobservable or latent factorial components are extracted, which are identified with pristine waters, potash-mining influence and urban sewage influence. These three factorial components or compositional factors are plotted in a factorial ternary diagram, which reflects the relative influence of each one of these factors on each observation.  相似文献   

16.
Like compositions in general, regionalized compositions present the problem of spurious spatial correlation. To avoid this problem, this paper uses the additive-logratio transformation of regionalized compositions, following techniques introduced over the last few years for the statistical analysis of compositional data. It leads to an appropriate definition of a spatial covariance structure to describe spatial dependence between regionalized variables subject to constant-sum constraints in the case of weak stationarity. To illustrate stated problems, simulated data are used.  相似文献   

17.
Hydraulic exponents and unit hydraulic exponents are unit-sum constrained, which requires that they be analyzed by statistical methods designed for compositional data. Though uncertainties remain regarding selection of the best constraining operation and method of handling departures from the unit-sum constraint, neither category of uncertainty should be an impediment to the selection of the appropriate statistical methodology. In a small sample study, the hydraulic geometry of different types of streams were compared: (1) semi-arid: perennial vs. ephemeral; (2) tropical: Puerto Rico vs. West Malaysia; and (3) semi-arid vs. tropical (by pooling the previous data sets). All three comparisons revealed statistically significant differences in either logratio mean vectorsor logratio covariance matrices but not both. All six categories of data had logistic normal distributions. Because the derivatives at a given discharge of curvilinear hydraulic geometry relationships and hydraulic exponents on either side of the breakpoints of piecewise linear relationships are unit-sum constrained, they also can be studied by compositional methods. However, the compositional approach is limited in cases where distributions have large departures from logistic normality and for streams that have negative hydraulic exponents.  相似文献   

18.
The current theoretical development of the analysis of compositional data in the article by Aitchison and Egozcue neglects the use of Harker’s variation diagrams and other similar plots as “meaningless” or “useless” on compositional data. In this work, it is shown that variation diagrams essentially are not a correlation tool but a graphical representation of the mass actions and mass balances principles in the context of a given geological system, and, when they are used correctly, they provide vital information for the igneous petrologist. The qualitative validity of the “spurious trends” in these diagrams is also shown, when they are interpreted in their proper geological framework. The example previously used by Rollinson to test the usefulness of the log-ratio transformation in the Aitchison and Egozcue article is revisited here in order to fully illustrate the proper use of this tool.  相似文献   

19.
The purpose of this study was to capture the structure of a geological process within a multivariate statistical framework by using geological data generated by that process and, where applicable, by associated processes. It is important to the practitioners of statistical analysis in geology to determine the degree to which the geological process can be captured and explained by multivariate analysis by using sample data (for example, chemical analyses) taken from the geological entity created by that process. The process chosen for study here is the creation of a coal deposit. In this study, the data are chemical analyses expressed in weight percentage and parts per million, and therefore are subject to the affects of the constant sum phenomenon. The data array is the chemical composition of the whole coal. This restriction on the data imposed by the constant sum phenomenon was removed by using the centered logratio (clr) transformation. The use of scatter plots and principal component biplots applied to the raw and centered logratio (clr) transformed data arrays affects the interpretation and comprehension of the geological process of coalification.  相似文献   

20.
The Devonian/Carboniferous (D/C) boundary is a critical interval in the Phanerozoic history, which is associated with vigorous climatic perturbations, continental glaciation, global sea-level fall and rapidly increased extinction rates in marine realms. In many sections world-wide, these global changes left a marked lithological signature, in particular the Hangenberg black shale (products of deep-shelf anoxia) and the overlying Hangenberg sandstone (sudden siliciclastic influx into predominantly carbonate depositional environments). Both layers bear a distinct geochemical signature. Even though either or both of these two lithologies are absent at many sections, their correlative counterparts can be indicated by subtle geochemical markers. We studied elemental geochemistry of fourteen D/C boundary sections in six key areas across Europe with the aim to select globally correlatable elemental proxy for the D/C boundary. Analysis of raw/log-transformed geochemical data (EDXRF, c.p.s. units), presenting the standard approach here, indicates that concentrations of terrigenous elements (Al, K, Rb, Ti and Zr) are mainly controlled by diluted Ca (carried by marine calcium carbonate) in limestone facies and, accordingly, their variations can be related to carbonate production in the sea rather than to terrigenous input from continent. Nevertheless, due to the relative nature of geochemical observations, reliance solely on statistical processing of raw data might lead to incomplete picture of multivariate data structure and/or biased interpretations. For this reason, the aim of this contribution is to discuss the logratio alternatives of the standard statistical methods, which may better reflect the relative nature of the data. For this purpose, principal component analysis was employed to reveal main geochemical patterns and while the geochemical signature of the D/C boundary was further analysed using Q-mode clustering that leads to predicative orthonormal logratio coordinates – balances. The comprehensive picture of the multivariate data structure provided by these statistical tools makes them a primary choice for exploratory compositional data analysis. At the same time, it turns out that the standard and compositional approaches have synergic effects. This fact can be extensively used in further geochemical studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号