首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
This paper addresses three intractable difficulties associated with the statistical analysis of compositional data, such as percentages or ppm. These are: (1) that such data do not follow multivariate normal distributions thus rendering inappropriate, standard parametric statistical tests and estimation procedures, (2) the covariance/correlation coefficients between specific pairs of components are determined in whole or in part by the presence or absence of other components, and, (3) the negative bias property. That is, at least one covariance and therefore at least one correlation, must be negative, hence the remaining correlations are prevented from ranging freely between ?1 and +1. It follows that correlation coefficients formed from compositional data are not only not absolute, but also frequently spurious. Standard multivariate procedures based on them are unreliable, and intrinsic associations between components inferred from strong positive correlations in particular, are potentially false. In a recent 2009 paper, it was reported that 59 surface sediment samples from 7 regions in the Polish exclusive economic zone had been chemically analyzed for 16 elements. Enrichment factors together with crude correlation coefficients between selected elements were presented. All these quantities were computed from the initial raw compositional data resulting from the chemical analyses In this paper, a statistical procedure is presented which is distinctly different to the enrichment factor computations based on the same raw compositional data. The procedure generates a log-ratio measure of the abundance of each element in each of the seven regions, thus enabling comparisons of relative levels of pollution between the regions. Although the two techniques are quite unrelated, it is shown that in general, extremely high or low measures of the relative abundances in the regions are associated with correspondingly high or low values of the enrichment factors in the same regions that were reported in the 2009 paper. That is, the statistical analysis confirms the results of the enrichment factor data in the identification of the most to the least polluted regions. In an additional analysis, the residue term was excluded from each sediment sample by rescaling the 16 element concentrations to sum to 100%, thus forming 59 residue-free sub-compositions. Crude correlation coefficients were computed for pairs of elements of this sub-compositional data. These revealed that certain correlations based on the initial raw data that were reported in the 2009 paper for the same pairs of elements, were not only inconsistent, but sometimes also contradictory. Such contradictions imply that intrinsic geochemical element associations inferred in that paper from such correlations were false.  相似文献   

2.
Developments in the statistical analysis of compositional data over the last two decades have made possible a much deeper exploration of the nature of variability and the possible processes associated with compositional data sets from many disciplines. In this paper, we concentrate on geochemical data. First, we explain how hypotheses of compositional variability may be formulated within the natural sample space, the unit simplex, including useful hypotheses of sub-compositional discrimination and specific perturbational change. Then we develop through standard methodology, such as generalised likelihood ratio tests, statistical tools to allow the systematic investigation of a lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require special construction. We comment on the use of graphical methods in compositional data analysis and on the ordination of specimens. The recent development of the concept of compositional processes is then explained, together with the necessary tools for a staying-in-the-simplex approach, such as the singular value decomposition of a compositional data set. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major oxide and trace element compositions of metamorphosed limestones from the Grampian Highlands of Scotland. Finally, we discuss some unresolved problems in the statistical analysis of compositional processes.  相似文献   

3.
Estimation of regionalized compositions: A comparison of three methods   总被引:1,自引:0,他引:1  
A regionalized composition is a random vector function whose components are positive and sum to a constant at every point of the sampling region. Consequently, the components of a regionalized composition are necessarily spatially correlated. This spatial dependence—induced by the constant sum constraint—is a spurious spatial correlation and may lead to misinterpretations of statistical analyses. Furthermore, the cross-covariance matrices of the regionalized composition are singular, as is the coefficient matrix of the cokriging system of equations. Three methods of performing estimation or prediction of a regionalized composition at unsampled points are discussed: (1) the direct approach of estimating each variable separately; (2) the basis method, which is applicable only when a random function is available that can he regarded as the size of the regionalized composition under study; (3) the logratio approach, using the additive-log-ratio transformation proposed by J. Aitchison, which allows statistical analysis of compositional data. We present a brief theoretical review of these three methods and compare them using compositional data from the Lyons West Oil Field in Kansas (USA). It is shown that, although there are no important numerical differences, the direct approach leads to invalid results, whereas the basis method and the additive-log-ratio approach are comparable.  相似文献   

4.
On the Interpretation of Orthonormal Coordinates for Compositional Data   总被引:1,自引:0,他引:1  
The simplex with the Aitchison geometry is a natural sample space for compositional data, that is, observations carrying only relative information (especially proportions, percentages, etc., often occurring in the geosciences). For this reason, standard statistical methods that rely on Euclidean structure of the real space cannot be used directly for statistical analysis. At first, compositional data need to be expressed in coordinates of an orthonormal basis on the simplex (with respect to the Aitchison geometry). The mathematical interpretation of the orthonormal coordinates is derived from the procedure by which they are constructed (called sequential binary partition), and they act as balances between groups of compositional parts. The goal of this paper is to describe the covariance structure of coordinates and, consequently, to provide a complementary interpretation based on log-ratios of parts of the original composition. It must be noted that, in a composition, the ratios themselves contain all the relevant information. The possibilities as well as the limitations of this approach are demonstrated through illustrative examples.  相似文献   

5.
Thermal groundwater is currently being exploited for district-scale heating in many locations world-wide. The chemical compositions of these thermal waters reflect the provenance and circulation patterns of the groundwater, which are controlled by recharge, rock type and geological structure. Exploring the provenance of these waters using multivariate statistical analysis (MSA) techniques increases our understanding of the hydrothermal circulation systems, and provides a reliable tool for assessing these resources.Hydrochemical data from thermal springs situated in the Carboniferous Dublin Basin in east-central Ireland were explored using MSA, including hierarchical cluster analysis (HCA) and principal component analysis (PCA), to investigate the source aquifers of the thermal groundwaters. To take into account the compositional nature of the hydrochemical data, compositional data analysis (CoDa) techniques were used to process the data prior to the MSA.The results of the MSA were examined alongside detailed time-lapse temperature measurements from several of the springs, and indicate the influence of three important hydrogeological processes on the hydrochemistry of the thermal waters: 1) salinity and increased water-rock interaction; 2) dissolution of carbonates; and 3) dissolution of sulfides, sulfates and oxides associated with mineral deposits. The use of MSA within the CoDa framework identified subtle temporal variations in the hydrochemistry of the thermal springs, which could not be identified with more traditional graphing methods, or with a standard statistical approach. The MSA was successful in distinguishing different geological settings and different annual behaviours within the group of springs. This study demonstrates the usefulness of the application of MSA within the CoDa framework in order to better understand the underlying controlling processes governing the hydrochemistry of a group of thermal springs in a low-enthalpy setting.  相似文献   

6.
《Chemical Geology》2006,225(1-2):1-15
Microprobe monazite dating has been increasingly used to constrain the timing of deformation and metamorphism because of the potential to date very small monazite domains (down to 5 μm or less) in structural and petrologic context. This paper presents an analytical strategy, presentation format, and error considerations for microprobe monazite dating. The strategy involves high-resolution compositional mapping to delineate compositional domains within monazite crystals. Then for each compositional domain, a series of Th, U and Pb analyses are made, and a single date and error are calculated. The number of analyses in each domain is determined by the desired statistical precision of the date. Results from several monazite grains are typically combined and, along with textural relationships, are used to build an argument that the dates constrain the age of a deformation or metamorphic event. The total error involves three components: short-term random error (dominated by counting statistical uncertainty), short-term systematic error (uncertainty in background correction, conductive coating variation, and calibration), and long-term systematic error (uncertainty in standard composition, mass absorption factors, decay constants, etc.). In homogeneous compositional domains, short-term random errors (2σ) of less than 10 m.y. can be obtained from five to ten analyses. However, short-term systematic error, mainly background estimation uncertainty, would typically result in a doubling of the magnitude of random error. Microprobe dates are presented as a single Gaussian probability distribution for each domain, along with representative compositional maps. It is recommended that a consistency standard be analyzed during each analytical session and the results be reported along with those from the unknown. This proposed strategy and format are compatible with those of other geochronological techniques; they incorporate analytical limitations associated with trace, as opposed to major element, microprobe analysis, and will allow better comparisons to be made between labs and between different geochronological techniques.  相似文献   

7.
Compositional data analysis   总被引:1,自引:0,他引:1  
Compositional data occur naturally in the geosciences — tables of chemical analyses, rock-compositions, sedimentary proportions, pollen-analytical tables, etc. The statistical analysis of such data requires special techniques and it is not possible to use standard methods of computing correlation coefficients and carry out multivariate statistical analyses without the risk of incurring grave mistakes. The special property of compositional data, to wit, the fact that the determinations on each specimen sum to a constant, means that the variables involved in the study occur in constrained space defined by the simplex , a restricted part of real space.  相似文献   

8.
9.
The analysis and interpretation of compositional data, such as major oxide compositions of rocks, has been traditionally plagued by the so-called constant-sum or closure problem. Particular difficulties have been the lack of a satisfactory, interpretable covariance structure and of rich, tractable, parametric classes of distributions on the simplex sample space. Consideration of logistic and logratio transformations between the simplex and Euclidan space has allowed the introduction of new concepts of covariance structure and of classes of logistic-normal distributions which have now opened up a substantial and meaningful array of statistical methodology for compositional data. From the motivation of a wide variety of practical geological problems we examine the range of possibilities with this new approach to the constant-sum problem.  相似文献   

10.
Geologists may want to classify compositional data and express the classification as a map. Regionalized classification is a tool that can be used for this purpose, but it incorporates discriminant analysis, which requires the computation and inversion of a covariance matrix. Covariance matrices of compositional data always will be singular (noninvertible) because of the unit-sum constraint. Fortunately, discriminant analyses can be calculated using a pseudo-inverse of the singular covariance matrix; this is done automatically by some statistical packages such as SAS. Granulometric data from the Darss Sill region of the Baltic Sea is used to explore how the pseudo-inversion procedure influences discriminant analysis results, comparing the algorithm used by SAS to the more conventional Moore–Penrose algorithm. Logratio transforms have been recommended to overcome problems associated with analysis of compositional data, including singularity. A regionalized classification of the Darss Sill data after logratio transformation is different only slightly from one based on raw granulometric data, suggesting that closure problems do not influence severely regionalized classification of compositional data.  相似文献   

11.
Do We Really Need Mantle Components to Define Mantle Composition?   总被引:2,自引:0,他引:2  
We discuss the concept of components in the Earth's mantle startingfrom a petrological and geochemical approach, but adopting anew method of projection of geochemical and isotopic data. Thisallows the compositional variability of magmatic associationsto be evaluated in multi-dimensional space, thus simultaneouslyaccounting for a large number of compositional variables. Wedemonstrate that ocean island basalts (OIB) and mid-ocean ridgebasalts (MORB) are derived from a marble-cake mantle, in whichdifferent degrees of partial melting of recycled lithosphere,which are heterogeneous in age and composition, contribute tothe magma genesis. This view is supported by the variabilityin the geochemical and isotopic signatures of OIB that are observedon the scale of a single ocean island as well as on that ofan ocean, mostly varying between two extreme compositions, thatare not strictly related to the commonly accepted mantle components(DMM, EMI, EMII, HIMU). Rather they are a distinctive featureof the mantle source sampled at each ocean island and are stronglydependent on the Pb isotope system. We recommend a change inperspective in studies of MORB–OIB geochemistry from onebased on physically distinct mantle components to a model basedon the existence of a marble-cake-like upper mantle. Althoughresembling the statistical upper mantle, this model impliesthat geochemical homogenization can be attained only withinthe limits of local mantle composition, so that a world-wideuniform depleted reservoir cannot be sampled by simply extendingthe volume of the region undergoing partial melting. KEY WORDS: geochemistry; isotope; mantle; OIB  相似文献   

12.
BLU Estimators and Compositional Data   总被引:5,自引:0,他引:5  
One of the principal objections to the logratio approach for the statistical analysis of compositional data has been the absence of unbiasedness and minimum variance properties of some estimators: they seem not to be BLU estimator. Using a geometric approach, we introduce the concept of metric variance and of a compositional unbiased estimator, and we show that the closed geometric mean is a c-BLU estimator (compositional best linear unbiased estimator with respect to the geometry of the simplex) of the center of the distribution of a random composition. Thus, it satisfies analogous properties to the arithmetic mean as a BLU estimator of the expected value in real space. The geometric approach used gives real meaning to the concepts of measure of central tendency and measure of dispersion and opens up a new way of understanding the statistical analysis of compositional data.  相似文献   

13.
Compositional Geometry and Mass Conservation   总被引:1,自引:0,他引:1  
A geometrical structure is imposed on compositional data by physical and chemical laws, principally mass conservation. Therefore, statistical or mathematical investigation of possible relations between data values and such laws must be consistent with this structure. This demands that geometrical concepts, such as points that specify both mass and composition in linear space, and lines in projective space that specify composition only, be clearly defined and consistent with mass conservation. Mass thus becomes the norm in composition space in place of the Euclidean norm of ordinary space. Coordinate transformations inconsistent with this geometry are accordingly unnatural and misleading. They are also unnecessary because correlation arising from the constant mass presents no unusual difficulty in the analysis of the underlying quadratic form.  相似文献   

14.
Outlier detection is often a key task in a statistical analysis and helps guard against poor decision-making based on results that have been influenced by anomalous observations. For multivariate data sets, large Mahalanobis distances in raw data space or large Mahalanobis distances in principal components analysis, transformed data space, are routinely used to detect outliers. Detection in principal components analysis space can also utilise goodness of fit distances. For spatial applications, however, these global forms can only detect outliers in a non-spatial manner. This can result in false positive detections, such as when an observation’s spatial neighbours are similar, or false negative detections such as when its spatial neighbours are dissimilar. To avoid mis-classifications, we demonstrate that a local adaptation of various global methods can be used to detect multivariate spatial outliers. In particular, we account for local spatial effects via the use of geographically weighted data with either Mahalanobis distances or principal components analysis. Detection performance is assessed using simulated data as well as freshwater chemistry data collected over all of Great Britain. Results clearly show value in both geographically weighted methods to outlier detection.  相似文献   

15.
The statistical analysis of compositional data is based on determining an appropriate transformation from the simplex to real space. Possible transfonnations and outliers strongly interact: parameters of transformations may be influenced particularly by outliers, and the result of goodness-of-fit tests will reflect their presence. Thus, the identification of outliers in compositional datasets and the selection of an appropriate transformation of the same data, are problems that cannot be separated. A robust method for outlier detection together with the likelihood of transformed data is presented as a first approach to solve those problems when the additive-logratio and multivariate Box-Cox transformations are used. Three examples illustrate the proposed methodology.  相似文献   

16.
Chemical reactions in aqueous geochemical systems are driven by nonequilibrium conditions, and their dynamics can be deduced through the distributional analysis (identification of probability laws) of complex compositional indices. In this perspective, compositional data analysis offers the possibility to investigate the behavior of the composition as a whole instead of isolated chemical species, with the awareness that multispecies systems are characterized by the simultaneous interactions among all their parts. We addressed this problem using D???1 isometric log-ratio coordinates describing the D compositional dataset of the river chemistry of the Alpine region (D number of variables), thus working in the \({{\mathbb{R}}^{D - 1}}\) statistical sample space. The D???1 coordinates were chosen using the decreasing variance criterion so that each one could provide information about different space–time properties for the investigated geochemical system. Coordinates dominated by heterogeneity appear to be able to capture regime shifts only on a long-time period and monitor processes on a very wide scale. On the other hand, coordinates characterized by lower variability present multimodality, thus capturing the presence of alternative states in the analyzed spatial domain also for the current time. Further developments are needed to determine the ranges of conditions for which variability and other statistics can be useful indicators of regime shifts on different time–space scales in geochemical systems.  相似文献   

17.
张强勇  林春金  向文 《岩土力学》2006,27(10):1831-1834
渗流、渗压是影响坝基稳定的重要因素。通过对石板水电站重力坝坝基渗流、渗压多年观测数据的统计分析,考虑水位、温度和时效的影响,采用逐步回归分析法建立了坝基渗流、渗压的统计回归分析模型。回归计算结果表明,渗流、渗压统计值与观测值吻合较好,统计复相关系数较大,估计标准误差较小。统计回归分析模型有效地反映了坝基渗流、渗压的变化规律和发展趋势,为评价大坝运行安全性态提供了有效地分析手段和途径。  相似文献   

18.
Grain-size measurements are a type of compositional data and thus subject to closure effects and nonnormality. The logratio transform of Aitchison successfully resolves these problems in compositional data analysis. An application to modern sediment data from the northern part of the South China Sea demonstrates that logratio principal components analysis provides a clear separation of data which cannot be obtained by ordinary principal components analysis, and that cluster analysis using logratio principal components gives a much better classification of sediments than does cluster analysis using raw data. The delineation of sedimentary environments on the basis of a logratio classification of sediment samples provides a better understanding of hydrodynamic conditions on the shelf.  相似文献   

19.
Shallow groundwater is one of the main water resources in the arid and semi-arid regions. However, it is threatened by not only the reduced rainfall and recharge capacity, but also the water table drawdown and seawater intrusion. Such factors could cause a deterioration of the water quality and consequently the loss of a valuable hydraulic resource. This study aimed to improve our knowledge on the groundwater chemical quality evolution of the Sfax shallow aquifer, one of the most vulnerable areas in Tunisia, by developing a geochemical study using statistical and numerical methods. Salinization was identified by factorial analysis, PCA, and hierarchical clustering analysis in addition to the numerical MODPATH model. These findings confirmed that the groundwater quality has deteriorated due to natural and anthropogenic processes with a different influence of mineralization factors. They also revealed the location of seawater intrusion by focusing on the most vulnerable areas which are Chaffar and Djbeniana. Methodologically, the use of MODPATH model for seawater intrusion determination might be considered as the backbone for future studies in Tunisian coastal aquifers. The numerical model supports the results obtained by the geochemical analysis. Both methods are valuable tools as they contribute to trend determinations, management, and recovery plans.  相似文献   

20.
This paper describes a geostatistical method, known as factorial kriging analysis, which is well suited for analyzing multivariate spatial information. The method involves multivariate variogram modeling, principal component analysis, and cokriging. It uses several separate correlation structures, each corresponding to a specific spatial scale, and yields a set of regionalized factors summarizing the main features of the data for each spatial scale. This method is applied to an area of high manganese-ore mining activity in Amapá State, North Brazil. Two scales of spatial variation (0.33 and 2.0 km) are identified and interpreted. The results indicate that, for the short-range structure, manganese, arsenic, iron, and cadmium are associated with human activities due to the mining work, while for the long-range structure, the high aluminum, selenium, copper, and lead concentrations, seem to be related to the natural environment. At each scale, the correlation structure is analyzed, and regionalized factors are estimated by cokriging and then mapped.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号