首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 484 毫秒
1.
Developments in the statistical analysis of compositional data over the last two decades have made possible a much deeper exploration of the nature of variability and the possible processes associated with compositional data sets from many disciplines. In this paper, we concentrate on geochemical data. First, we explain how hypotheses of compositional variability may be formulated within the natural sample space, the unit simplex, including useful hypotheses of sub-compositional discrimination and specific perturbational change. Then we develop through standard methodology, such as generalised likelihood ratio tests, statistical tools to allow the systematic investigation of a lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require special construction. We comment on the use of graphical methods in compositional data analysis and on the ordination of specimens. The recent development of the concept of compositional processes is then explained, together with the necessary tools for a staying-in-the-simplex approach, such as the singular value decomposition of a compositional data set. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major oxide and trace element compositions of metamorphosed limestones from the Grampian Highlands of Scotland. Finally, we discuss some unresolved problems in the statistical analysis of compositional processes.  相似文献   

2.
The study of hydrogeochemical data sets frequently calls for statistical dimension reducing techniques. It is well known that hydrochemical parameters are compositions and, for this type of data, the direct application of classical statistical methods based on the correlation matrix yield spurious results. But new results on compositional data analysis have identified the sampling space, the simplex, with an Euclidean space, a fact that allows us to define a simplicial factor analysis strategy, thus overcoming the problem. For illustration, we use samples from the Llobregat River and its tributaries (NE Spain). Three unobservable or latent factorial components are extracted, which are identified with pristine waters, potash-mining influence and urban sewage influence. These three factorial components or compositional factors are plotted in a factorial ternary diagram, which reflects the relative influence of each one of these factors on each observation.  相似文献   

3.
余先川  张冠鹏  姚旺 《江苏地质》2019,43(1):103-110
地球化学数据元素间的相关分析具有重要意义。地球化学数据是一种不遵循正态分布的成分数据,其封闭特征存在挖掘和分析的困难和障碍,因此许多传统的统计方法不适合使用。主要通过元素值的趋势分析了地球化学元素之间的相关关系,并提出了形态相关系数的概念。该方法不需要数据服从正态分布,可以忽略闭包特征的影响。实验表明,该方法简单、稳定、准确,可以显示数据元素之间的关系。此外,方法计算过程消除了回溯,因此适用于大数据的实时和动态分析。  相似文献   

4.
Compositional data analysis   总被引:1,自引:0,他引:1  
Compositional data occur naturally in the geosciences — tables of chemical analyses, rock-compositions, sedimentary proportions, pollen-analytical tables, etc. The statistical analysis of such data requires special techniques and it is not possible to use standard methods of computing correlation coefficients and carry out multivariate statistical analyses without the risk of incurring grave mistakes. The special property of compositional data, to wit, the fact that the determinations on each specimen sum to a constant, means that the variables involved in the study occur in constrained space defined by the simplex , a restricted part of real space.  相似文献   

5.
Logratio Analysis and Compositional Distance   总被引:10,自引:0,他引:10  
The concept of distance between two compositions is important in the statistical analysis of compositional data, particularly in such activities as cluster analysis and multidimensional scaling. This paper exposes the fallacies in a recent criticism of logratio-based distance measures—in particular, the misstatements that logratio methods destroy distance structures and are denominator dependent. Emphasis is on ensuring that compositional data analysis involving distance concepts satisfies certain logically necessary invariance conditions. Logratio analysis and its associated distance measures satisfy these conditions.  相似文献   

6.
In this paper, we provide a detailed account of our sample fusion, calibration and instrumentation methods for major-element whole-rock analysis by XRF, and we discuss several aspects of sample preparation and instrument performance that are important considerations for accurate analysis. The fusion procedure involves moderate capital costs and is easy to apply, yielding flat, polished, homogeneous glass discs as cast. The calibration method utilizes a least-squares procedure that rigorously fits data according to both compositional and counting statistical uncertainties. We use a side-window Rh tube for analyzing major elements (including Na) and employ real-time testing for constant count rate to reject spurious results.

The methods result in excellent analytical precision and reproducibility. The standards used for calibration lie within compositional and counting statistical uncertainties of best-fit straight lines. Analyses of replicate discs and repeated analyses of single discs show excellent long-term reproducibility over several months, approaching counting statistical uncertainties in several cases. Comparison with independent measurements made by other laboratories using instrumental neutron activation and X-ray fluorescence analyses shows excellent agreement with our results.

A side-window Rh tube gives increased detection limits for most major elements, but otherwise shows little difference in precision compared to a Cr tube. This means that major and trace elements can be analyzed without changing X-ray sources, which provides saving in terms of time and money, as well as being a convenience to the analyst.  相似文献   


7.
On the Interpretation of Orthonormal Coordinates for Compositional Data   总被引:1,自引:0,他引:1  
The simplex with the Aitchison geometry is a natural sample space for compositional data, that is, observations carrying only relative information (especially proportions, percentages, etc., often occurring in the geosciences). For this reason, standard statistical methods that rely on Euclidean structure of the real space cannot be used directly for statistical analysis. At first, compositional data need to be expressed in coordinates of an orthonormal basis on the simplex (with respect to the Aitchison geometry). The mathematical interpretation of the orthonormal coordinates is derived from the procedure by which they are constructed (called sequential binary partition), and they act as balances between groups of compositional parts. The goal of this paper is to describe the covariance structure of coordinates and, consequently, to provide a complementary interpretation based on log-ratios of parts of the original composition. It must be noted that, in a composition, the ratios themselves contain all the relevant information. The possibilities as well as the limitations of this approach are demonstrated through illustrative examples.  相似文献   

8.
This paper is part of a special issue of Applied Geochemistry focusing on reliable applications of compositional multivariate statistical methods. This study outlines the application of compositional data analysis (CoDa) to calibration of geochemical data and multivariate statistical modelling of geochemistry and grain-size data from a set of Holocene sedimentary cores from the Ganges-Brahmaputra (G-B) delta. Over the last two decades, understanding near-continuous records of sedimentary sequences has required the use of core-scanning X-ray fluorescence (XRF) spectrometry, for both terrestrial and marine sedimentary sequences. Initial XRF data are generally unusable in ‘raw-format’, requiring data processing in order to remove instrument bias, as well as informed sequence interpretation. The applicability of these conventional calibration equations to core-scanning XRF data are further limited by the constraints posed by unknown measurement geometry and specimen homogeneity, as well as matrix effects. Log-ratio based calibration schemes have been developed and applied to clastic sedimentary sequences focusing mainly on energy dispersive-XRF (ED-XRF) core-scanning. This study has applied high resolution core-scanning XRF to Holocene sedimentary sequences from the tidal-dominated Indian Sundarbans, (Ganges-Brahmaputra delta plain). The Log-Ratio Calibration Equation (LRCE) was applied to a sub-set of core-scan and conventional ED-XRF data to quantify elemental composition. This provides a robust calibration scheme using reduced major axis regression of log-ratio transformed geochemical data. Through partial least squares (PLS) modelling of geochemical and grain-size data, it is possible to derive robust proxy information for the Sundarbans depositional environment. The application of these techniques to Holocene sedimentary data offers an improved methodological framework for unravelling Holocene sedimentation patterns.  相似文献   

9.
Out-of-equilibrium crystallization often produces complex compositional variability in minerals, generating zoning and other mixing phenomena. The appropriate microchemical characterization of the resulting out-of-equilibrium patterns is of critical importance in understanding the overall physical and chemical properties of the host crystalline phases. In this framework, the modeling of compositional changes assumes a fundamental role. However, when compositional data are used, their management with standard exploratory, statistical, graphical, and numerical tools may give misleading results attributable to the phenomenon of induced correlations. To avoid these problems, methods able to extract compositional data from their constrained space (the simplex) in order to apply standard statistics, have to be adopted. As an alternative, the use of tools having properties able to work in the simplex geometry has to be considered. A luzonite single crystal (ideal composition, Cu3AsS4) exhibiting concentric and sector zoning was studied using electron probe microanalysis in order to understand the mechanisms which give rise to chemical variability and conditions in the developing environment. Compositional variations were determined by collecting data along three different transects. The major and minor elements (Cu, As, S, Fe, Sb, Sn) were analyzed with the aim of characterizing their patterns of association in the crystal and, hence, crystal evolution. The whole covariance structure as well as the chemical relationships between the successive zones was investigated by means of compositional methods, considering both data transformation and the stay in the simplex approach. Results indicate that the crystal grew under quiescent conditions, where chemical control was primarily exercised by the mineral’s surface and only minor effects were due to changes in the composition of the surrounding fluid. Consequently, an oscillatory uptake of chemical components occurred in which a competition between famatinite-like (Cu3SbS4) and kuramite-like (Cu3SnS4) domains characterized the As-poor zones.  相似文献   

10.
Omitting variables in compositional data analysis may lead to a substantial change in results from that of multivariate statistical analysis. In particular, this is the case for principal component analysis and the compositional biplot, where both the interpretation of loadings and scores of the remaining subcomposition are affected. A stepwise procedure is introduced that allows for a reduction of the original composition to a subcomposition by avoiding a substantial change of the information, like those carried by the compositional biplot. The subcomposition is easier to handle and interpret. Numerical results give evidence of the usefulness of the procedure.  相似文献   

11.
Estimation of regionalized compositions: A comparison of three methods   总被引:1,自引:0,他引:1  
A regionalized composition is a random vector function whose components are positive and sum to a constant at every point of the sampling region. Consequently, the components of a regionalized composition are necessarily spatially correlated. This spatial dependence—induced by the constant sum constraint—is a spurious spatial correlation and may lead to misinterpretations of statistical analyses. Furthermore, the cross-covariance matrices of the regionalized composition are singular, as is the coefficient matrix of the cokriging system of equations. Three methods of performing estimation or prediction of a regionalized composition at unsampled points are discussed: (1) the direct approach of estimating each variable separately; (2) the basis method, which is applicable only when a random function is available that can he regarded as the size of the regionalized composition under study; (3) the logratio approach, using the additive-log-ratio transformation proposed by J. Aitchison, which allows statistical analysis of compositional data. We present a brief theoretical review of these three methods and compare them using compositional data from the Lyons West Oil Field in Kansas (USA). It is shown that, although there are no important numerical differences, the direct approach leads to invalid results, whereas the basis method and the additive-log-ratio approach are comparable.  相似文献   

12.
Hydraulic exponents and unit hydraulic exponents are unit-sum constrained, which requires that they be analyzed by statistical methods designed for compositional data. Though uncertainties remain regarding selection of the best constraining operation and method of handling departures from the unit-sum constraint, neither category of uncertainty should be an impediment to the selection of the appropriate statistical methodology. In a small sample study, the hydraulic geometry of different types of streams were compared: (1) semi-arid: perennial vs. ephemeral; (2) tropical: Puerto Rico vs. West Malaysia; and (3) semi-arid vs. tropical (by pooling the previous data sets). All three comparisons revealed statistically significant differences in either logratio mean vectorsor logratio covariance matrices but not both. All six categories of data had logistic normal distributions. Because the derivatives at a given discharge of curvilinear hydraulic geometry relationships and hydraulic exponents on either side of the breakpoints of piecewise linear relationships are unit-sum constrained, they also can be studied by compositional methods. However, the compositional approach is limited in cases where distributions have large departures from logistic normality and for streams that have negative hydraulic exponents.  相似文献   

13.
Thermal groundwater is currently being exploited for district-scale heating in many locations world-wide. The chemical compositions of these thermal waters reflect the provenance and circulation patterns of the groundwater, which are controlled by recharge, rock type and geological structure. Exploring the provenance of these waters using multivariate statistical analysis (MSA) techniques increases our understanding of the hydrothermal circulation systems, and provides a reliable tool for assessing these resources.Hydrochemical data from thermal springs situated in the Carboniferous Dublin Basin in east-central Ireland were explored using MSA, including hierarchical cluster analysis (HCA) and principal component analysis (PCA), to investigate the source aquifers of the thermal groundwaters. To take into account the compositional nature of the hydrochemical data, compositional data analysis (CoDa) techniques were used to process the data prior to the MSA.The results of the MSA were examined alongside detailed time-lapse temperature measurements from several of the springs, and indicate the influence of three important hydrogeological processes on the hydrochemistry of the thermal waters: 1) salinity and increased water-rock interaction; 2) dissolution of carbonates; and 3) dissolution of sulfides, sulfates and oxides associated with mineral deposits. The use of MSA within the CoDa framework identified subtle temporal variations in the hydrochemistry of the thermal springs, which could not be identified with more traditional graphing methods, or with a standard statistical approach. The MSA was successful in distinguishing different geological settings and different annual behaviours within the group of springs. This study demonstrates the usefulness of the application of MSA within the CoDa framework in order to better understand the underlying controlling processes governing the hydrochemistry of a group of thermal springs in a low-enthalpy setting.  相似文献   

14.
This paper addresses three intractable difficulties associated with the statistical analysis of compositional data, such as percentages or ppm. These are: (1) that such data do not follow multivariate normal distributions thus rendering inappropriate, standard parametric statistical tests and estimation procedures, (2) the covariance/correlation coefficients between specific pairs of components are determined in whole or in part by the presence or absence of other components, and, (3) the negative bias property. That is, at least one covariance and therefore at least one correlation, must be negative, hence the remaining correlations are prevented from ranging freely between ?1 and +1. It follows that correlation coefficients formed from compositional data are not only not absolute, but also frequently spurious. Standard multivariate procedures based on them are unreliable, and intrinsic associations between components inferred from strong positive correlations in particular, are potentially false. In a recent 2009 paper, it was reported that 59 surface sediment samples from 7 regions in the Polish exclusive economic zone had been chemically analyzed for 16 elements. Enrichment factors together with crude correlation coefficients between selected elements were presented. All these quantities were computed from the initial raw compositional data resulting from the chemical analyses In this paper, a statistical procedure is presented which is distinctly different to the enrichment factor computations based on the same raw compositional data. The procedure generates a log-ratio measure of the abundance of each element in each of the seven regions, thus enabling comparisons of relative levels of pollution between the regions. Although the two techniques are quite unrelated, it is shown that in general, extremely high or low measures of the relative abundances in the regions are associated with correspondingly high or low values of the enrichment factors in the same regions that were reported in the 2009 paper. That is, the statistical analysis confirms the results of the enrichment factor data in the identification of the most to the least polluted regions. In an additional analysis, the residue term was excluded from each sediment sample by rescaling the 16 element concentrations to sum to 100%, thus forming 59 residue-free sub-compositions. Crude correlation coefficients were computed for pairs of elements of this sub-compositional data. These revealed that certain correlations based on the initial raw data that were reported in the 2009 paper for the same pairs of elements, were not only inconsistent, but sometimes also contradictory. Such contradictions imply that intrinsic geochemical element associations inferred in that paper from such correlations were false.  相似文献   

15.
Cluster analysis can be used to group samples and to develop ideas about the multivariate geochemistry of the data set at hand. Due to the complex nature of regional geochemical data (neither normal nor log-normal, strongly skewed, often multi-modal data distributions, data closure), cluster analysis results often strongly depend on the preparation of the data (e.g. choice of the transformation) and on the clustering algorithm selected. Different variants of cluster analysis can lead to surprisingly different cluster centroids, cluster sizes and classifications even when using exactly the same input data. Cluster analysis should not be misused as a statistical “proof” of certain relationships in the data. The use of cluster analysis as an exploratory data analysis tool requires a powerful program system to test different data preparation, processing and clustering methods, including the ability to present the results in a number of easy to grasp graphics. Such a tool has been developed as a package for the R statistical software. Two example data sets from geochemistry are used to demonstrate how the results change with different data preparation and clustering methods. A data set from S-Norway with a known number of clusters and cluster membership is used to test the performance of different clustering and data preparation techniques. For a complex data set from the Kola Peninsula, cluster analysis is applied to explore regional data structures.  相似文献   

16.
BLU Estimators and Compositional Data   总被引:5,自引:0,他引:5  
One of the principal objections to the logratio approach for the statistical analysis of compositional data has been the absence of unbiasedness and minimum variance properties of some estimators: they seem not to be BLU estimator. Using a geometric approach, we introduce the concept of metric variance and of a compositional unbiased estimator, and we show that the closed geometric mean is a c-BLU estimator (compositional best linear unbiased estimator with respect to the geometry of the simplex) of the center of the distribution of a random composition. Thus, it satisfies analogous properties to the arithmetic mean as a BLU estimator of the expected value in real space. The geometric approach used gives real meaning to the concepts of measure of central tendency and measure of dispersion and opens up a new way of understanding the statistical analysis of compositional data.  相似文献   

17.
Logratios and Natural Laws in Compositional Data Analysis   总被引:1,自引:0,他引:1  
The impossibility of interpreting correlations of raw compositional components and associated statistical methods has been clearly demonstrated over the last four decades and alternative statistical methodology developed. Despite this a return to the traditional use of raw components has been advocated recently and alternative methodology such as logratio analysis strongly criticized. This paper exposes the fallacies in this recent advocacy and demonstrates the constructive role that logratio analysis can play in geological compositional problems, in particular in the investigation of natural laws and in subcompositional investigations.  相似文献   

18.
The statistical analysis of compositional data is based on determining an appropriate transformation from the simplex to real space. Possible transfonnations and outliers strongly interact: parameters of transformations may be influenced particularly by outliers, and the result of goodness-of-fit tests will reflect their presence. Thus, the identification of outliers in compositional datasets and the selection of an appropriate transformation of the same data, are problems that cannot be separated. A robust method for outlier detection together with the likelihood of transformed data is presented as a first approach to solve those problems when the additive-logratio and multivariate Box-Cox transformations are used. Three examples illustrate the proposed methodology.  相似文献   

19.
The statistical analysis of compositional data based on logratios of parts is not suitable when zeros are present in a data set. Nevertheless, if there is interest in using this modeling approach, several strategies have been published in the specialized literature which can be used. In particular, substitution or imputation strategies are available for rounded zeros. In this paper, existing nonparametric imputation methods—both for the additive and the multiplicative approach—are revised and essential properties of the last method are given. For missing values a generalization of the multiplicative approach is proposed.  相似文献   

20.
The Devonian/Carboniferous (D/C) boundary is a critical interval in the Phanerozoic history, which is associated with vigorous climatic perturbations, continental glaciation, global sea-level fall and rapidly increased extinction rates in marine realms. In many sections world-wide, these global changes left a marked lithological signature, in particular the Hangenberg black shale (products of deep-shelf anoxia) and the overlying Hangenberg sandstone (sudden siliciclastic influx into predominantly carbonate depositional environments). Both layers bear a distinct geochemical signature. Even though either or both of these two lithologies are absent at many sections, their correlative counterparts can be indicated by subtle geochemical markers. We studied elemental geochemistry of fourteen D/C boundary sections in six key areas across Europe with the aim to select globally correlatable elemental proxy for the D/C boundary. Analysis of raw/log-transformed geochemical data (EDXRF, c.p.s. units), presenting the standard approach here, indicates that concentrations of terrigenous elements (Al, K, Rb, Ti and Zr) are mainly controlled by diluted Ca (carried by marine calcium carbonate) in limestone facies and, accordingly, their variations can be related to carbonate production in the sea rather than to terrigenous input from continent. Nevertheless, due to the relative nature of geochemical observations, reliance solely on statistical processing of raw data might lead to incomplete picture of multivariate data structure and/or biased interpretations. For this reason, the aim of this contribution is to discuss the logratio alternatives of the standard statistical methods, which may better reflect the relative nature of the data. For this purpose, principal component analysis was employed to reveal main geochemical patterns and while the geochemical signature of the D/C boundary was further analysed using Q-mode clustering that leads to predicative orthonormal logratio coordinates – balances. The comprehensive picture of the multivariate data structure provided by these statistical tools makes them a primary choice for exploratory compositional data analysis. At the same time, it turns out that the standard and compositional approaches have synergic effects. This fact can be extensively used in further geochemical studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号