首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
The statistical analysis of compositional data based on logratios of parts is not suitable when zeros are present in a data set. Nevertheless, if there is interest in using this modeling approach, several strategies have been published in the specialized literature which can be used. In particular, substitution or imputation strategies are available for rounded zeros. In this paper, existing nonparametric imputation methods—both for the additive and the multiplicative approach—are revised and essential properties of the last method are given. For missing values a generalization of the multiplicative approach is proposed.  相似文献   

2.
Developments in the statistical analysis of compositional data over the last two decades have made possible a much deeper exploration of the nature of variability and the possible processes associated with compositional data sets from many disciplines. In this paper, we concentrate on geochemical data. First, we explain how hypotheses of compositional variability may be formulated within the natural sample space, the unit simplex, including useful hypotheses of sub-compositional discrimination and specific perturbational change. Then we develop through standard methodology, such as generalised likelihood ratio tests, statistical tools to allow the systematic investigation of a lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require special construction. We comment on the use of graphical methods in compositional data analysis and on the ordination of specimens. The recent development of the concept of compositional processes is then explained, together with the necessary tools for a staying-in-the-simplex approach, such as the singular value decomposition of a compositional data set. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major oxide and trace element compositions of metamorphosed limestones from the Grampian Highlands of Scotland. Finally, we discuss some unresolved problems in the statistical analysis of compositional processes.  相似文献   

3.
A variety of approaches to the testing of distributional forms for compositional data has appeared in the literature, all based on logratio or Box–Cox transformation techniques and to a degree dependent on the divisor chosen in the formation of ratios for these transformations. This paper, recognizing the special algebraic–geometric structure of the standard simplex sample space for compositional problems, the use of the fundamental simplicial singular value decomposition, and an associated power-perturbation characterization of compositional variability, attempts to provide a definitive approach to such distributional testing problems. Our main consideration is the characterization and testing of additive logistic–normal form, but we also indicate possible applications to logistic skew normal forms once a full range of multivariate tests emerges. The testing strategy is illustrated with both simulated data and applications to some real geological compositional data sets.  相似文献   

4.
The high-dimensionality of many compositional data sets has caused geologists to look for insights into the observed patterns of variability through two dimension-reducing procedures: (i)the selection of a few subcompositions for particular study, and (ii)principal component analysis. After a brief critical review of the unsatisfactory state of current statistical methodology for these two procedures, this paper takes as a starting point for the resolution of persisting difficulties a recent approach to principal component analysis through a new definition of the covariance structure of a composition. This approach is first applied for expository purposes to a small illustrative compositional data set and then to a number of larger published geochemical data sets. The new approach then leads naturally to a method of measuring the extent to which a subcomposition retains the pattern of variability of the whole composition and so provides a criterion for the selection of suitable subcompositions. Such a selection process is illustrated by application to geochemical data sets.  相似文献   

5.
Measuring Subcompositional Incoherence   总被引:2,自引:0,他引:2  
Subcompositional coherence is a fundamental property of Aitchison’s approach to compositional data analysis and is the principal justification for using ratios of components. We maintain, however, that lack of subcompositional coherence (i.e., incoherence) can be measured in an attempt to evaluate whether any given technique is close enough, for all practical purposes, to being subcompositionally coherent. This opens up the field to alternative methods that might be better suited to cope with problems such as data zeros and outliers while being only slightly incoherent. The measure that we propose is based on the distance measure between components. We show that the two-part subcompositions, which appear to be the most sensitive to subcompositional incoherence, can be used to establish a distance matrix that can be directly compared with the pairwise distances in the full composition. The closeness of these two matrices can be quantified using a stress measure that is common in multidimensional scaling, providing a measure of subcompositional incoherence. The approach is illustrated using power-transformed correspondence analysis, which has already been shown to converge to log-ratio analysis as the power transform tends to zero.  相似文献   

6.
The analysis and interpretation of compositional data, such as major oxide compositions of rocks, has been traditionally plagued by the so-called constant-sum or closure problem. Particular difficulties have been the lack of a satisfactory, interpretable covariance structure and of rich, tractable, parametric classes of distributions on the simplex sample space. Consideration of logistic and logratio transformations between the simplex and Euclidan space has allowed the introduction of new concepts of covariance structure and of classes of logistic-normal distributions which have now opened up a substantial and meaningful array of statistical methodology for compositional data. From the motivation of a wide variety of practical geological problems we examine the range of possibilities with this new approach to the constant-sum problem.  相似文献   

7.
BLU Estimators and Compositional Data   总被引:5,自引:0,他引:5  
One of the principal objections to the logratio approach for the statistical analysis of compositional data has been the absence of unbiasedness and minimum variance properties of some estimators: they seem not to be BLU estimator. Using a geometric approach, we introduce the concept of metric variance and of a compositional unbiased estimator, and we show that the closed geometric mean is a c-BLU estimator (compositional best linear unbiased estimator with respect to the geometry of the simplex) of the center of the distribution of a random composition. Thus, it satisfies analogous properties to the arithmetic mean as a BLU estimator of the expected value in real space. The geometric approach used gives real meaning to the concepts of measure of central tendency and measure of dispersion and opens up a new way of understanding the statistical analysis of compositional data.  相似文献   

8.
Spurious Clusters in Granulometric Data Caused by Logratio Transformation   总被引:1,自引:0,他引:1  
The logratio transformation aims to eliminate spurious correlations between components of compositional data. When applying this method to granulometric data, there arise numerical problems with zero (empty) components. In this paper, the method of logratio transformation with zero replacement is examined using one natural and two simulated granulometric data sets. The results show that this method generates spurious clusters, and thus it is not appropriate for the investigation of grain-size data in particular, and of compositional data with zero components in general.  相似文献   

9.
Compositional data are very common in the earth sciences. Nevertheless, little attention has been paid to the spatial interpolation of these data sets. Most interpolators do not necessarily satisfy the constant sum and nonnegativity constraints of compositional data, nor take spatial structure into account. Therefore, compositional kriging is introduced as a straightforward extension of ordinary kriging that complies with these constraints. In two case studies, the performance of compositional kriging is compared with that of the additive logratio-transform. In the first case study, compositional kriging yielded significantly more accurate predictions than the additive logratio-transform, while in the second case study the performances were comparable.  相似文献   

10.
Geologists may want to classify compositional data and express the classification as a map. Regionalized classification is a tool that can be used for this purpose, but it incorporates discriminant analysis, which requires the computation and inversion of a covariance matrix. Covariance matrices of compositional data always will be singular (noninvertible) because of the unit-sum constraint. Fortunately, discriminant analyses can be calculated using a pseudo-inverse of the singular covariance matrix; this is done automatically by some statistical packages such as SAS. Granulometric data from the Darss Sill region of the Baltic Sea is used to explore how the pseudo-inversion procedure influences discriminant analysis results, comparing the algorithm used by SAS to the more conventional Moore–Penrose algorithm. Logratio transforms have been recommended to overcome problems associated with analysis of compositional data, including singularity. A regionalized classification of the Darss Sill data after logratio transformation is different only slightly from one based on raw granulometric data, suggesting that closure problems do not influence severely regionalized classification of compositional data.  相似文献   

11.
On the Interpretation of Orthonormal Coordinates for Compositional Data   总被引:1,自引:0,他引:1  
The simplex with the Aitchison geometry is a natural sample space for compositional data, that is, observations carrying only relative information (especially proportions, percentages, etc., often occurring in the geosciences). For this reason, standard statistical methods that rely on Euclidean structure of the real space cannot be used directly for statistical analysis. At first, compositional data need to be expressed in coordinates of an orthonormal basis on the simplex (with respect to the Aitchison geometry). The mathematical interpretation of the orthonormal coordinates is derived from the procedure by which they are constructed (called sequential binary partition), and they act as balances between groups of compositional parts. The goal of this paper is to describe the covariance structure of coordinates and, consequently, to provide a complementary interpretation based on log-ratios of parts of the original composition. It must be noted that, in a composition, the ratios themselves contain all the relevant information. The possibilities as well as the limitations of this approach are demonstrated through illustrative examples.  相似文献   

12.
In recognizing that a composition, such as a major oxide or sediment composition, provides information only about the relative, not the absolute, magnitudes of its components, this paper exposes the compositional variation array as the simplest and minimum way of summarizing the pattern of variability within a compositional data set. Such summaries are free of the notorious hazards of the constant-sum constraint and when depicted in relative variation diagrams can often provide substantial insights into the nature of the compositional variability. Concepts and practice are illustrated by reference to a number of real data sets.  相似文献   

13.
The reflectance of vitrinite (collotelinite) particles is a widely used parameter as a geothermometer for the estimation of the thermal maturity of organic matter enclosed in rocks. However, several problems have occurred during the last decades, which can be traced back to basically three causes: human mistakes, technical problems, and problems associated with the structural and compositional inhomogeneity of organic matter. Whilst in most cases the first two types of uncertainties can be handled by standardization, the third can cause significant problems during interpretation due to its generally inestimable character. The suppression of vitrinite reflectance and statistical problems originated from small sample size, and outliers belong to this latter type.International standards, such as the ASTM and the ISO, define the vitrinite reflectance parameter as a statistical average of measured data, disregarding the fact that the average is an unresisting and unrobust statistical parameter. In other words, the average is very sensitive to outliers and distribution.The aim of this research was to find and test a better, more resistant, and robust statistical parameter used by traditional parametric and nonparametric statistics, which can be applied in practice instead of the average. Three categories of statistical problems were studied on coal and disperse organic matter (DOM) samples: the distribution of measured values, the effect of data number, and the effect of outliers on statistical parameters. The statistical experiments carried out on numerous original and generated sample sets show that the median (med) and the most frequent value (Mn), a special weighted average, are better parameters to estimate the thermal maturity of organic matter especially above 1% reflectance value.  相似文献   

14.
The study of hydrogeochemical data sets frequently calls for statistical dimension reducing techniques. It is well known that hydrochemical parameters are compositions and, for this type of data, the direct application of classical statistical methods based on the correlation matrix yield spurious results. But new results on compositional data analysis have identified the sampling space, the simplex, with an Euclidean space, a fact that allows us to define a simplicial factor analysis strategy, thus overcoming the problem. For illustration, we use samples from the Llobregat River and its tributaries (NE Spain). Three unobservable or latent factorial components are extracted, which are identified with pristine waters, potash-mining influence and urban sewage influence. These three factorial components or compositional factors are plotted in a factorial ternary diagram, which reflects the relative influence of each one of these factors on each observation.  相似文献   

15.
中国沙漠物源研究:回顾与展望   总被引:1,自引:0,他引:1  
付旭东  王岩松 《沉积学报》2015,33(6):1063-1073
沙漠物源研究不仅在风沙地貌学上有重大的理论和实践意义,而且对联结大气粉尘排放、黄土堆积、气候系统和海洋生物地球化学循环也有重要价值。在简要回顾中国沙漠研究的基础上,梳理了中国沙漠物源研究的理论、方法和主要成果,结合国际上沉积物物源分析的趋向,指出目前世界沙漠物源的研究都是基于沉积物组份属性统计的反演模型,这种研究范式在数据获取、处理与解释方面存在缺陷,如沉积物的取样设计与测试分析、未消除"粒级依赖"对沉积物组份影响、数据未进行对数变换、忽视Dickinson图解应用的前提条件等。提出今后中国沙漠物源研究的方向:①采用正确统一的取样设计和分析方法对各沙漠的沉积物组份属性进行系统研究,建立中国沙漠沉积物组份的属性数据库;②选择若干典型沙漠,利用其周边山地详尽的地质构造、母岩和气候数据,定量构建沉积物生成的正演模型,模拟源区生成沉积物的数量、成分和结构,并用建成的沉积物组份属性数据验证和校正;③定量评估河流冲积物、冲积-湖积物、洪积-冲积物和基岩风化的残积、坡积物对中国各沙漠物源的贡献率与迁移路径,研究中国各沙漠中细颗粒物质的形成机制,对比中国沙漠与低纬度沙漠物源的形成机制;④定量研究历史和地质时间尺度沙漠-黄土-深海沉积物物源的内在联系及其驱动因素,建立陆地-大气-海洋物质循环的机理模型。  相似文献   

16.
In analyses of compositional data, it is important to select a suitable unchanging component as a reference to detect the behavior of a single variable in isolation. This paper introduces two tests for detecting the unchanging component, based on a new approach that utilizes the coefficient of variation of component ratios. That is, the coefficient of variation of a compositional ratio is subject to change when the unchanging component is switched between the denominator and numerator, and the coefficient of variation tends to be small when the unchanging component occurs as the denominator against any arbitrary components (Test 1). In addition, the ratio of the component pair that gives the lowest coefficient of variation is most likely to represent the two unchanging components (Test 2). However, Tests 1 and 2 are not necessary and sufficient conditions for uniquely finding the unchanging component. To verify the effectiveness of the tests, 500 artificial datasets were analyzed and the results suggest that the tests are able to identify the unchanging component, although Test 1 underperforms when the dataset includes a component with skewness greater than 0.5, and Test 2 fails when the dataset includes components with a correlation coefficient greater than 0.75. These defects can be overcome by interpreting the two test results in a complementary manner. The proposed tests provide powerful yet simple criteria for identifying the unchanging component in compositional data; however, the reliability of this approach needs to be assessed in further studies.  相似文献   

17.
A new method has been developed to separate the compositional variations in ocean island basalts into those that result from variations in source composition and from the melting process itself. The approach depends on correlations between isotope ratios, which can only come from source inhomogeneities, and elemental concentrations. Analysis of three data sets shows that the inhomogeneities beneath Theistareykir, in NE Iceland, Kilauea and Pitcairn can be produced by subduction of oceanic islands and volcanic ridges. The thicknesses of the lithosphere on which such islands were constructed and potential temperatures of the plumes that produced them can be estimated from the geochemical observations. Model ages are harder to determine, though simple assumptions give about 400 Ma for the Theistareykir source and 1.2 Ga for Kilauea. The model may also provide a physical explanation for the commonly used isotopic classification of ocean island basalts, with the isotopic composition changing from HIMU through EMII to EMI as the melt fraction increases. These results have been obtained from a small number of data sets obtained from ocean island basalts erupted in small areas during short time intervals. More such observations are needed to discover whether geochemical observations from other islands are consistent with the same model.  相似文献   

18.
Perturbation is an operation defined on the simplex and can be used for centering compositional data in a ternary diagram, applying objective criteria. Because a straight line in the original diagram is still astraight line in the perturbed diagram, gridlines or compositional fields defined by straight lines can easily be included in the operation. Simultaneous perturbation of data, gridlines, and/or compositional fields is shown to improve both visualization and graphical interpretation of compositions in ternary diagrams. This is illustrated by some examples using simulated as well as real data.  相似文献   

19.
The statistical analysis of compositional data is based on determining an appropriate transformation from the simplex to real space. Possible transfonnations and outliers strongly interact: parameters of transformations may be influenced particularly by outliers, and the result of goodness-of-fit tests will reflect their presence. Thus, the identification of outliers in compositional datasets and the selection of an appropriate transformation of the same data, are problems that cannot be separated. A robust method for outlier detection together with the likelihood of transformed data is presented as a first approach to solve those problems when the additive-logratio and multivariate Box-Cox transformations are used. Three examples illustrate the proposed methodology.  相似文献   

20.
This paper addresses three intractable difficulties associated with the statistical analysis of compositional data, such as percentages or ppm. These are: (1) that such data do not follow multivariate normal distributions thus rendering inappropriate, standard parametric statistical tests and estimation procedures, (2) the covariance/correlation coefficients between specific pairs of components are determined in whole or in part by the presence or absence of other components, and, (3) the negative bias property. That is, at least one covariance and therefore at least one correlation, must be negative, hence the remaining correlations are prevented from ranging freely between ?1 and +1. It follows that correlation coefficients formed from compositional data are not only not absolute, but also frequently spurious. Standard multivariate procedures based on them are unreliable, and intrinsic associations between components inferred from strong positive correlations in particular, are potentially false. In a recent 2009 paper, it was reported that 59 surface sediment samples from 7 regions in the Polish exclusive economic zone had been chemically analyzed for 16 elements. Enrichment factors together with crude correlation coefficients between selected elements were presented. All these quantities were computed from the initial raw compositional data resulting from the chemical analyses In this paper, a statistical procedure is presented which is distinctly different to the enrichment factor computations based on the same raw compositional data. The procedure generates a log-ratio measure of the abundance of each element in each of the seven regions, thus enabling comparisons of relative levels of pollution between the regions. Although the two techniques are quite unrelated, it is shown that in general, extremely high or low measures of the relative abundances in the regions are associated with correspondingly high or low values of the enrichment factors in the same regions that were reported in the 2009 paper. That is, the statistical analysis confirms the results of the enrichment factor data in the identification of the most to the least polluted regions. In an additional analysis, the residue term was excluded from each sediment sample by rescaling the 16 element concentrations to sum to 100%, thus forming 59 residue-free sub-compositions. Crude correlation coefficients were computed for pairs of elements of this sub-compositional data. These revealed that certain correlations based on the initial raw data that were reported in the 2009 paper for the same pairs of elements, were not only inconsistent, but sometimes also contradictory. Such contradictions imply that intrinsic geochemical element associations inferred in that paper from such correlations were false.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号