首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Isometric Logratio Transformations for Compositional Data Analysis   总被引:37,自引:0,他引:37  
Geometry in the simplex has been developed in the last 15 years mainly based on the contributions due to J. Aitchison. The main goal was to develop analytical tools for the statistical analysis of compositional data. Our present aim is to get a further insight into some aspects of this geometry in order to clarify the way for more complex statistical approaches. This is done by way of orthonormal bases, which allow for a straightforward handling of geometric elements in the simplex. The transformation into real coordinates preserves all metric properties and is thus called isometric logratio transformation (ilr). An important result is the decomposition of the simplex, as a vector space, into orthogonal subspaces associated with nonoverlapping subcompositions. This gives the key to join compositions with different parts into a single composition by using a balancing element. The relationship between ilr transformations and the centered-logratio (clr) and additive-logratio (alr) transformations is also studied. Exponential growth or decay of mass is used to illustrate compositional linear processes, parallelism and orthogonality in the simplex.  相似文献   

2.
Groups of Parts and Their Balances in Compositional Data Analysis   总被引:7,自引:0,他引:7  
Amalgamation of parts of a composition has been extensively used as a technique of analysis to achieve reduced dimension, as was discussed during the CoDaWork'03 meeting (Girona, Spain, 2003). It was shown to be a non-linear operation in the simplex that does not preserve distances under perturbation. The discussion motivated the introduction in the present paper of concepts such as group of parts, balance between groups, and sequential binary partition, which are intended to provide tools of compositional data analysis for dimension reduction. Key concepts underlying this development are the established tools of subcomposition, coordinates in an orthogonal basis of the simplex, balancing element and, in general, the Aitchison geometry in the simplex. Main new results are: a method to analyze grouped parts of a compositional vector through the adequate coordinates in an ad hoc orthonormal basis; and the study of balances of groups of parts (inter-group analysis) as an orthogonal projection similar to that used in standard subcompositional analysis (intra-group analysis). A simulated example compares results when testing equal centers of two populations using amalgamated parts and balances; it shows that, in certain circumstances, results from both analysis can disagree.  相似文献   

3.
The log ratio methodology converts compositional data, such as concentrations of chemical elements in a rock, from their original Aitchison geometry to interpretable real orthonormal coordinates, thereby allowing meaningful statistical processing and visualization. However, it must be taken into account that the original concentrations can be flawed by detection limit or imprecision problems that can severely affect the resulting coordinates. This paper aims to construct such orthonormal log ratio coordinates, called weighted pivot coordinates, that capture the relevant relative information about an original component and treat the redundant information in a controlled manner. Theoretical developments are supported by a thorough simulation study. Weighted pivot coordinates are then applied to the geochemical mapping of catchment outlet sediments from the National Geochemical Survey of Australia illustrating their advantage over possible alternatives.  相似文献   

4.
Correlation coefficients are most popular in statistical practice for measuring pairwise variable associations. Compositional data, carrying only relative information, require a different treatment in correlation analysis. For identifying the association between two compositional parts in terms of their dominance with respect to the other parts in the composition, symmetric balances are constructed, which capture all relative information in the form of aggregated logratios of both compositional parts of interest. The resulting coordinates have the form of logratios of individual parts to a (weighted) “average representative” of the other parts, and thus, they clearly indicate how the respective parts dominate in the composition on average. The balances form orthonormal coordinates, and thus, the standard correlation measures relying on the Euclidean geometry can be used to measure the association. Simulation studies provide deeper insight into the proposed approach, and allow for comparisons with alternative measures. An application from geochemistry (Kola moss) indicates that correlations based on symmetric balances serve as a sensitive tool to reveal underlying geochemical processes.  相似文献   

5.
A Critical Approach to Probability Laws in Geochemistry   总被引:2,自引:0,他引:2  
Probability laws in geochemistry have been a major issue of concern over the last decades. The lognormal on the positive real line or the additive logistic normal on the simplex are two classical laws of probability to model geochemical data sets due to their association with a relative measure of difference. This fact is not fully exploited in the classical approach when viewing both the positive real line and the simplex as subsets of real space with the induced geometry. But it can be taken into account considering them as real linear vector spaces with their own structure. This approach implies using a particular geometry and a measure different from the usual ones. Therefore, we can work with the coordinates with respect to an orthonormal basis. It could be shown that the two mentioned laws are associated with a normal distribution on the coordinates. In this contribution both approaches are compared, and a real data set is used to illustrate similarities and differences.  相似文献   

6.
BLU Estimators and Compositional Data   总被引:5,自引:0,他引:5  
One of the principal objections to the logratio approach for the statistical analysis of compositional data has been the absence of unbiasedness and minimum variance properties of some estimators: they seem not to be BLU estimator. Using a geometric approach, we introduce the concept of metric variance and of a compositional unbiased estimator, and we show that the closed geometric mean is a c-BLU estimator (compositional best linear unbiased estimator with respect to the geometry of the simplex) of the center of the distribution of a random composition. Thus, it satisfies analogous properties to the arithmetic mean as a BLU estimator of the expected value in real space. The geometric approach used gives real meaning to the concepts of measure of central tendency and measure of dispersion and opens up a new way of understanding the statistical analysis of compositional data.  相似文献   

7.
Compositional data analysis requires selecting an orthonormal basis with which to work on coordinates. In most cases this selection is based on a data driven criterion. Principal component analysis provides bases that are, in general, functions of all the original parts, each with a different weight hindering their interpretation. For interpretative purposes, it would be better to have each basis component as a ratio or balance of the geometric means of two groups of parts, leaving irrelevant parts with a zero weight. This is the role of principal balances, defined as a sequence of orthonormal balances which successively maximize the explained variance in a data set. The new algorithm to compute principal balances requires an exhaustive search along all the possible sets of orthonormal balances. To reduce computational time, the sets of possible partitions for up to 15 parts are stored. Two other suboptimal, but feasible, algorithms are also introduced: (i) a new search for balances following a constrained principal component approach and (ii) the hierarchical cluster analysis of variables. The latter is a new approach based on the relation between the variation matrix and the Aitchison distance. The properties and performance of these three algorithms are illustrated using a typical data set of geochemical compositions and a simulation exercise.  相似文献   

8.
The analysis and interpretation of compositional data, such as major oxide compositions of rocks, has been traditionally plagued by the so-called constant-sum or closure problem. Particular difficulties have been the lack of a satisfactory, interpretable covariance structure and of rich, tractable, parametric classes of distributions on the simplex sample space. Consideration of logistic and logratio transformations between the simplex and Euclidan space has allowed the introduction of new concepts of covariance structure and of classes of logistic-normal distributions which have now opened up a substantial and meaningful array of statistical methodology for compositional data. From the motivation of a wide variety of practical geological problems we examine the range of possibilities with this new approach to the constant-sum problem.  相似文献   

9.

Compositional data carry their relevant information in the relationships (logratios) between the compositional parts. It is shown how this source of information can be used in regression modeling, where the composition could either form the response, or the explanatory part, or even both. An essential step to set up a regression model is the way how the composition(s) enter the model. Here, balance coordinates will be constructed that support an interpretation of the regression coefficients and allow for testing hypotheses of subcompositional independence. Both classical least-squares regression and robust MM regression are treated, and they are compared within different regression models at a real data set from a geochemical mapping project.

  相似文献   

10.
Compositional Geometry and Mass Conservation   总被引:1,自引:0,他引:1  
A geometrical structure is imposed on compositional data by physical and chemical laws, principally mass conservation. Therefore, statistical or mathematical investigation of possible relations between data values and such laws must be consistent with this structure. This demands that geometrical concepts, such as points that specify both mass and composition in linear space, and lines in projective space that specify composition only, be clearly defined and consistent with mass conservation. Mass thus becomes the norm in composition space in place of the Euclidean norm of ordinary space. Coordinate transformations inconsistent with this geometry are accordingly unnatural and misleading. They are also unnecessary because correlation arising from the constant mass presents no unusual difficulty in the analysis of the underlying quadratic form.  相似文献   

11.
The current theoretical development of the analysis of compositional data in the article by Aitchison and Egozcue neglects the use of Harker’s variation diagrams and other similar plots as “meaningless” or “useless” on compositional data. In this work, it is shown that variation diagrams essentially are not a correlation tool but a graphical representation of the mass actions and mass balances principles in the context of a given geological system, and, when they are used correctly, they provide vital information for the igneous petrologist. The qualitative validity of the “spurious trends” in these diagrams is also shown, when they are interpreted in their proper geological framework. The example previously used by Rollinson to test the usefulness of the log-ratio transformation in the Aitchison and Egozcue article is revisited here in order to fully illustrate the proper use of this tool.  相似文献   

12.
Out-of-equilibrium crystallization often produces complex compositional variability in minerals, generating zoning and other mixing phenomena. The appropriate microchemical characterization of the resulting out-of-equilibrium patterns is of critical importance in understanding the overall physical and chemical properties of the host crystalline phases. In this framework, the modeling of compositional changes assumes a fundamental role. However, when compositional data are used, their management with standard exploratory, statistical, graphical, and numerical tools may give misleading results attributable to the phenomenon of induced correlations. To avoid these problems, methods able to extract compositional data from their constrained space (the simplex) in order to apply standard statistics, have to be adopted. As an alternative, the use of tools having properties able to work in the simplex geometry has to be considered. A luzonite single crystal (ideal composition, Cu3AsS4) exhibiting concentric and sector zoning was studied using electron probe microanalysis in order to understand the mechanisms which give rise to chemical variability and conditions in the developing environment. Compositional variations were determined by collecting data along three different transects. The major and minor elements (Cu, As, S, Fe, Sb, Sn) were analyzed with the aim of characterizing their patterns of association in the crystal and, hence, crystal evolution. The whole covariance structure as well as the chemical relationships between the successive zones was investigated by means of compositional methods, considering both data transformation and the stay in the simplex approach. Results indicate that the crystal grew under quiescent conditions, where chemical control was primarily exercised by the mineral’s surface and only minor effects were due to changes in the composition of the surrounding fluid. Consequently, an oscillatory uptake of chemical components occurred in which a competition between famatinite-like (Cu3SbS4) and kuramite-like (Cu3SnS4) domains characterized the As-poor zones.  相似文献   

13.
Chemical reactions in aqueous geochemical systems are driven by nonequilibrium conditions, and their dynamics can be deduced through the distributional analysis (identification of probability laws) of complex compositional indices. In this perspective, compositional data analysis offers the possibility to investigate the behavior of the composition as a whole instead of isolated chemical species, with the awareness that multispecies systems are characterized by the simultaneous interactions among all their parts. We addressed this problem using D???1 isometric log-ratio coordinates describing the D compositional dataset of the river chemistry of the Alpine region (D number of variables), thus working in the \({{\mathbb{R}}^{D - 1}}\) statistical sample space. The D???1 coordinates were chosen using the decreasing variance criterion so that each one could provide information about different space–time properties for the investigated geochemical system. Coordinates dominated by heterogeneity appear to be able to capture regime shifts only on a long-time period and monitor processes on a very wide scale. On the other hand, coordinates characterized by lower variability present multimodality, thus capturing the presence of alternative states in the analyzed spatial domain also for the current time. Further developments are needed to determine the ranges of conditions for which variability and other statistics can be useful indicators of regime shifts on different time–space scales in geochemical systems.  相似文献   

14.
Mathematical Geosciences - In the approach to compositional data analysis originated by John Aitchison, a set of linearly independent logratios (i.e., ratios of compositional parts, logarithmically...  相似文献   

15.
New Perspectives on Water Chemistry and Compositional Data Analysis   总被引:3,自引:0,他引:3  
Water chemistry is commonly investigated to determine the suitability of water for various uses. With increased knowledge of aqueous chemistry, it has become possible to interpret the evolutionary processes that determine water composition and quality. This paper presents procedures for exploring and modeling the environment using compositional data from water analysis, utilizing statistical tools in an appropriate sample space. Our procedures build on a methodology based on log-ratios initiated by John Aitchison in the early 1980's. They are not only useful for interpreting the structure of the data, but also for characterizing and modeling the influence of geochemical processes acting on the environment. The geochemistry of water samples collected from wells on Vulcano Island (one of the Aeolian Islands of the Italian province of Sicily) will be used to illustrate the techniques, although an exhaustive overview would require many different examples. Vulcano island is a quiescent volcanic area where mobilization of chemical species by weathering of volcanic rocks and input of gaseous components from fumarolic activity has produced environmental changes expressed in the composition of phreatic waters at the surface and in the shallow subsurface. Changes in the chemical composition of waters in unconfined aquifers of the northwestern part of the island around the active crater appear to be useful in understanding the natural processes at work.  相似文献   

16.
Developments in the statistical analysis of compositional data over the last two decades have made possible a much deeper exploration of the nature of variability and the possible processes associated with compositional data sets from many disciplines. In this paper, we concentrate on geochemical data. First, we explain how hypotheses of compositional variability may be formulated within the natural sample space, the unit simplex, including useful hypotheses of sub-compositional discrimination and specific perturbational change. Then we develop through standard methodology, such as generalised likelihood ratio tests, statistical tools to allow the systematic investigation of a lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require special construction. We comment on the use of graphical methods in compositional data analysis and on the ordination of specimens. The recent development of the concept of compositional processes is then explained, together with the necessary tools for a staying-in-the-simplex approach, such as the singular value decomposition of a compositional data set. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major oxide and trace element compositions of metamorphosed limestones from the Grampian Highlands of Scotland. Finally, we discuss some unresolved problems in the statistical analysis of compositional processes.  相似文献   

17.
Estimation of regionalized compositions: A comparison of three methods   总被引:1,自引:0,他引:1  
A regionalized composition is a random vector function whose components are positive and sum to a constant at every point of the sampling region. Consequently, the components of a regionalized composition are necessarily spatially correlated. This spatial dependence—induced by the constant sum constraint—is a spurious spatial correlation and may lead to misinterpretations of statistical analyses. Furthermore, the cross-covariance matrices of the regionalized composition are singular, as is the coefficient matrix of the cokriging system of equations. Three methods of performing estimation or prediction of a regionalized composition at unsampled points are discussed: (1) the direct approach of estimating each variable separately; (2) the basis method, which is applicable only when a random function is available that can he regarded as the size of the regionalized composition under study; (3) the logratio approach, using the additive-log-ratio transformation proposed by J. Aitchison, which allows statistical analysis of compositional data. We present a brief theoretical review of these three methods and compare them using compositional data from the Lyons West Oil Field in Kansas (USA). It is shown that, although there are no important numerical differences, the direct approach leads to invalid results, whereas the basis method and the additive-log-ratio approach are comparable.  相似文献   

18.
Omitting variables in compositional data analysis may lead to a substantial change in results from that of multivariate statistical analysis. In particular, this is the case for principal component analysis and the compositional biplot, where both the interpretation of loadings and scores of the remaining subcomposition are affected. A stepwise procedure is introduced that allows for a reduction of the original composition to a subcomposition by avoiding a substantial change of the information, like those carried by the compositional biplot. The subcomposition is easier to handle and interpret. Numerical results give evidence of the usefulness of the procedure.  相似文献   

19.
Differential Models for Evolutionary Compositions   总被引:1,自引:1,他引:0  
General systems are frequently decomposable into parts and these parts can evolve in time or space, a frequent occurrence in the field of Geosciences. In most cases, fitting models to forecast future states of the system is a goal of the analysis. Modelling interactions between parts may also be of common interest. The system can be analysed from different points of view; the traditional one consists in modelling each part of the system in time. Alternatively, modelling the evolution of the parts as proportions is proposed herein and attention is centred on the compositional evolution. The compositions are expressed in orthogonal coordinates (ilr) and then modelled using first-order differential equations with constant coefficients. Simple models are shown to be very flexible, including many of the standard growth curve models. The models are fitted using regression techniques on the integrated coordinates. The use and interpretation of these differential models is illustrated with several examples: a simulated example; urban waste in Catalonia (Spain); oil production and reserves; and growth of a luzonite crystal.  相似文献   

20.
Large rivers are a major pathway for the erosion products of continents to reach the oceans. The riverine transport of dissolved and particulate materials is generally related to a large number of interactions involving climate, hydrological, physico-chemical and biological aspects. Consequently, the investigation of large rivers allows the erosion processes at a global scale to be addressed, with information about biogeochemical cycles of the elements, weathering rates, physical erosion rates and CO2 consumption by the acid degradation of continental rocks. Today, good databases exist for the major dissolved ions in the world’s largest rivers. Since concentration of ions in river waters has to be considered in a compositional context, it is necessary to study the implications of considering the simplex, with its proper geometry, as the natural sample space. Using the additive (alr) or the isometric (ilr) log-ratio transformations, a composition can be represented as a real vector; but only in the second case can these coordinates be mapped onto orthogonal axes. Using data related to the dissolved load of some of the most important rivers in the world, the relationships among the major ions frequently used in molar ratio mixing diagrams have been investigated with alternative tools. Following the balances approach, an investigation of the properties of aqueous solutions of electrolytes that are often treated in terms of equilibrium constants is undertaken. The role played by the source—rain water, weathering of silic, carbonatic and evaporitic rocks, pollution—from which elements and chemical species can potentially be derived, has been checked through an investigation of a probabilistic model able to describe the relationships among the different components that contribute to the chemical composition of a river water sample.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号