首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
Spurious Clusters in Granulometric Data Caused by Logratio Transformation   总被引:1,自引:0,他引:1  
The logratio transformation aims to eliminate spurious correlations between components of compositional data. When applying this method to granulometric data, there arise numerical problems with zero (empty) components. In this paper, the method of logratio transformation with zero replacement is examined using one natural and two simulated granulometric data sets. The results show that this method generates spurious clusters, and thus it is not appropriate for the investigation of grain-size data in particular, and of compositional data with zero components in general.  相似文献   

2.
Geologists may want to classify compositional data and express the classification as a map. Regionalized classification is a tool that can be used for this purpose, but it incorporates discriminant analysis, which requires the computation and inversion of a covariance matrix. Covariance matrices of compositional data always will be singular (noninvertible) because of the unit-sum constraint. Fortunately, discriminant analyses can be calculated using a pseudo-inverse of the singular covariance matrix; this is done automatically by some statistical packages such as SAS. Granulometric data from the Darss Sill region of the Baltic Sea is used to explore how the pseudo-inversion procedure influences discriminant analysis results, comparing the algorithm used by SAS to the more conventional Moore–Penrose algorithm. Logratio transforms have been recommended to overcome problems associated with analysis of compositional data, including singularity. A regionalized classification of the Darss Sill data after logratio transformation is different only slightly from one based on raw granulometric data, suggesting that closure problems do not influence severely regionalized classification of compositional data.  相似文献   

3.
Logratio Analysis and Compositional Distance   总被引:10,自引:0,他引:10  
The concept of distance between two compositions is important in the statistical analysis of compositional data, particularly in such activities as cluster analysis and multidimensional scaling. This paper exposes the fallacies in a recent criticism of logratio-based distance measures—in particular, the misstatements that logratio methods destroy distance structures and are denominator dependent. Emphasis is on ensuring that compositional data analysis involving distance concepts satisfies certain logically necessary invariance conditions. Logratio analysis and its associated distance measures satisfy these conditions.  相似文献   

4.
勘查地球化学找矿工作的重点在于正确解译地球化学数据,以便从冗杂的地质信息中精准提取与成矿有关的异常信息,指导找矿研究。然而,地球化学数据属于成分数据,具有闭合效应,只有对数据进行正确的预处理才能应用多元统计分析方法,还原元素真实的空间分布。本文在阿舍勒铜锌矿区外围南侧区域共收集1009件地表原生晕样品,对样品中的13种微量元素进行测试,并对原始数据、对数及ilr变换后的数据进行EDA分析,对比数据空间分布及内部结构特征。运用(稳健)主成分分析,结合成分数据双标图及第一主成分点位图,解译三类数据指示的元素组合与成矿信息之间的关联。随后运用多重分形滤波技术,对以ilr变换为基础的稳健主成分得分数据分解元素组合异常和背景分布特征。结果表明:①经过对数及ilr变换后的数据相比原始数据空间尺度更均匀,数据近似正态分布;②三类数据双标图表明,仅ilr变换后的数据消除了“闭合效应”,且其第一主成分元素分组揭示了研究区铜矿化与铅锌多金属矿化组合;以对数变换与ilr变换为基础的第一主成分点位图表明,后者主成分得分异常能够较好指示研究区地质找矿信息;③结合研究区地质找矿信息、元素组合异常及背景空间分布特征,最终圈定3个有利找矿靶区。  相似文献   

5.
Geochemical samples from part of Lake Geneva were analyzed for 29oxides and trace elements. The variables and samples were subjected to R- and Q-mode analyses. The following techniques were applied in sequence: data transformation (normalization and standardization), data reduction (principal component and factor analysis), and automatic classification (dendrograph). The data were treated using various combinations of these techniques, and the resulting classifications evaluated by means of several criteria. The best classification of the samples is given by a cluster analysis performed on four principal components computed from standardized variables. The discriminatory power of the variables also was measured and determined to depend on their degree of intercorrelation. As a final result, the 29original variables were reduced to four components and the sediment samples classified into four facies, leading to easily interpretable geochemical maps.  相似文献   

6.
BLU Estimators and Compositional Data   总被引:5,自引:0,他引:5  
One of the principal objections to the logratio approach for the statistical analysis of compositional data has been the absence of unbiasedness and minimum variance properties of some estimators: they seem not to be BLU estimator. Using a geometric approach, we introduce the concept of metric variance and of a compositional unbiased estimator, and we show that the closed geometric mean is a c-BLU estimator (compositional best linear unbiased estimator with respect to the geometry of the simplex) of the center of the distribution of a random composition. Thus, it satisfies analogous properties to the arithmetic mean as a BLU estimator of the expected value in real space. The geometric approach used gives real meaning to the concepts of measure of central tendency and measure of dispersion and opens up a new way of understanding the statistical analysis of compositional data.  相似文献   

7.
Mathematical Geosciences - Even though the logratio methodology provides a range of both generic, mostly exploratory, and purpose-built coordinate representations of compositional data, simple...  相似文献   

8.
Logratios and Natural Laws in Compositional Data Analysis   总被引:1,自引:0,他引:1  
The impossibility of interpreting correlations of raw compositional components and associated statistical methods has been clearly demonstrated over the last four decades and alternative statistical methodology developed. Despite this a return to the traditional use of raw components has been advocated recently and alternative methodology such as logratio analysis strongly criticized. This paper exposes the fallacies in this recent advocacy and demonstrates the constructive role that logratio analysis can play in geological compositional problems, in particular in the investigation of natural laws and in subcompositional investigations.  相似文献   

9.
The Devonian/Carboniferous (D/C) boundary is a critical interval in the Phanerozoic history, which is associated with vigorous climatic perturbations, continental glaciation, global sea-level fall and rapidly increased extinction rates in marine realms. In many sections world-wide, these global changes left a marked lithological signature, in particular the Hangenberg black shale (products of deep-shelf anoxia) and the overlying Hangenberg sandstone (sudden siliciclastic influx into predominantly carbonate depositional environments). Both layers bear a distinct geochemical signature. Even though either or both of these two lithologies are absent at many sections, their correlative counterparts can be indicated by subtle geochemical markers. We studied elemental geochemistry of fourteen D/C boundary sections in six key areas across Europe with the aim to select globally correlatable elemental proxy for the D/C boundary. Analysis of raw/log-transformed geochemical data (EDXRF, c.p.s. units), presenting the standard approach here, indicates that concentrations of terrigenous elements (Al, K, Rb, Ti and Zr) are mainly controlled by diluted Ca (carried by marine calcium carbonate) in limestone facies and, accordingly, their variations can be related to carbonate production in the sea rather than to terrigenous input from continent. Nevertheless, due to the relative nature of geochemical observations, reliance solely on statistical processing of raw data might lead to incomplete picture of multivariate data structure and/or biased interpretations. For this reason, the aim of this contribution is to discuss the logratio alternatives of the standard statistical methods, which may better reflect the relative nature of the data. For this purpose, principal component analysis was employed to reveal main geochemical patterns and while the geochemical signature of the D/C boundary was further analysed using Q-mode clustering that leads to predicative orthonormal logratio coordinates – balances. The comprehensive picture of the multivariate data structure provided by these statistical tools makes them a primary choice for exploratory compositional data analysis. At the same time, it turns out that the standard and compositional approaches have synergic effects. This fact can be extensively used in further geochemical studies.  相似文献   

10.
Outlier Detection for Compositional Data Using Robust Methods   总被引:6,自引:2,他引:4  
Outlier detection based on the Mahalanobis distance (MD) requires an appropriate transformation in case of compositional data. For the family of logratio transformations (additive, centered and isometric logratio transformation) it is shown that the MDs based on classical estimates are invariant to these transformations, and that the MDs based on affine equivariant estimators of location and covariance are the same for additive and isometric logratio transformation. Moreover, for 3-dimensional compositions the data structure can be visualized by contour lines. In higher dimension the MDs of closed and opened data give an impression of the multivariate data behavior.  相似文献   

11.
Forty-six characters were measured on each of 14 Recent ostracode specimens representing 13 species collected along the British coast. Results obtained from ordination using principal components analysis agreed closely with results from cluster analysis, but ordination gave a better representation of taxonomic distances computed from the original data. Cophenetic correlations were 0.857 for the cluster analysis and 0.935 for distances computed from projections used in ordination. Characters showed considerable intercorrelation, and the first principal component was reified as a general-size factor correlated highly with height. The remaining principal components could not be reified precisely.  相似文献   

12.
The purpose of this study was to capture the structure of a geological process within a multivariate statistical framework by using geological data generated by that process and, where applicable, by associated processes. It is important to the practitioners of statistical analysis in geology to determine the degree to which the geological process can be captured and explained by multivariate analysis by using sample data (for example, chemical analyses) taken from the geological entity created by that process. The process chosen for study here is the creation of a coal deposit. In this study, the data are chemical analyses expressed in weight percentage and parts per million, and therefore are subject to the affects of the constant sum phenomenon. The data array is the chemical composition of the whole coal. This restriction on the data imposed by the constant sum phenomenon was removed by using the centered logratio (clr) transformation. The use of scatter plots and principal component biplots applied to the raw and centered logratio (clr) transformed data arrays affects the interpretation and comprehension of the geological process of coalification.  相似文献   

13.
Estimation of regionalized compositions: A comparison of three methods   总被引:1,自引:0,他引:1  
A regionalized composition is a random vector function whose components are positive and sum to a constant at every point of the sampling region. Consequently, the components of a regionalized composition are necessarily spatially correlated. This spatial dependence—induced by the constant sum constraint—is a spurious spatial correlation and may lead to misinterpretations of statistical analyses. Furthermore, the cross-covariance matrices of the regionalized composition are singular, as is the coefficient matrix of the cokriging system of equations. Three methods of performing estimation or prediction of a regionalized composition at unsampled points are discussed: (1) the direct approach of estimating each variable separately; (2) the basis method, which is applicable only when a random function is available that can he regarded as the size of the regionalized composition under study; (3) the logratio approach, using the additive-log-ratio transformation proposed by J. Aitchison, which allows statistical analysis of compositional data. We present a brief theoretical review of these three methods and compare them using compositional data from the Lyons West Oil Field in Kansas (USA). It is shown that, although there are no important numerical differences, the direct approach leads to invalid results, whereas the basis method and the additive-log-ratio approach are comparable.  相似文献   

14.
On criteria for measures of compositional difference   总被引:4,自引:0,他引:4  
Simple perceptions about the nature of compositions lead through logical necessity to certain forms of analysis of compositional data. In this paper the consequences of essential requirements of scale, perturbation and permutation invariance, together with that of subcompositional dominance, are applied to the problem of characterizing change and measures of difference between two compositions. It will be shown that one strongly advocated scalar measure of difference fails these tests of logical necessity, and that one particular form of scalar measure of difference (the sum of the squares of all possible logratio differences in the components of the two compositions), although not unique, emerges as the simplest and most tractable satisfying the criteria.  相似文献   

15.
主成分分析在地质样品分类与浓度预测中的应用研究   总被引:3,自引:0,他引:3  
甘露  罗立强 《岩矿测试》1999,20(2):97-100
用主成分分析方法研究地质样品的X荧光光谱强度与浓度的关系,对未知样分类并预测样品浓度。对标准化后的数据计算各样品的主成分得分,根据得分分布图可快速分类样品。对训练样品作主成分回归分析,建立降维的主成分回归模型,用主元回归预测各组分浓度,效果好于多元回归分析方法。  相似文献   

16.
In recognizing that a composition, such as a major oxide or sediment composition, provides information only about the relative, not the absolute, magnitudes of its components, this paper exposes the compositional variation array as the simplest and minimum way of summarizing the pattern of variability within a compositional data set. Such summaries are free of the notorious hazards of the constant-sum constraint and when depicted in relative variation diagrams can often provide substantial insights into the nature of the compositional variability. Concepts and practice are illustrated by reference to a number of real data sets.  相似文献   

17.
Thermal groundwater is currently being exploited for district-scale heating in many locations world-wide. The chemical compositions of these thermal waters reflect the provenance and circulation patterns of the groundwater, which are controlled by recharge, rock type and geological structure. Exploring the provenance of these waters using multivariate statistical analysis (MSA) techniques increases our understanding of the hydrothermal circulation systems, and provides a reliable tool for assessing these resources.Hydrochemical data from thermal springs situated in the Carboniferous Dublin Basin in east-central Ireland were explored using MSA, including hierarchical cluster analysis (HCA) and principal component analysis (PCA), to investigate the source aquifers of the thermal groundwaters. To take into account the compositional nature of the hydrochemical data, compositional data analysis (CoDa) techniques were used to process the data prior to the MSA.The results of the MSA were examined alongside detailed time-lapse temperature measurements from several of the springs, and indicate the influence of three important hydrogeological processes on the hydrochemistry of the thermal waters: 1) salinity and increased water-rock interaction; 2) dissolution of carbonates; and 3) dissolution of sulfides, sulfates and oxides associated with mineral deposits. The use of MSA within the CoDa framework identified subtle temporal variations in the hydrochemistry of the thermal springs, which could not be identified with more traditional graphing methods, or with a standard statistical approach. The MSA was successful in distinguishing different geological settings and different annual behaviours within the group of springs. This study demonstrates the usefulness of the application of MSA within the CoDa framework in order to better understand the underlying controlling processes governing the hydrochemistry of a group of thermal springs in a low-enthalpy setting.  相似文献   

18.
Common Principal Component Analysis is a generalization of standard principal components to several groups under the rigid mathematical assumption of equality of all latent vectors across groups (i.e., principal component directions), whereas the latent roots are allowed to vary between groups (differing inflations of dispersion ellipsoids). In practice, data that fulfill these strict requirements are relatively rare. Examples from palaeontology are used to illustrate the principles. Compositional data can be made to fit the Common Principal Component (CPC) model by the appropriate logratio covariance matrix.  相似文献   

19.
The analysis and interpretation of compositional data, such as major oxide compositions of rocks, has been traditionally plagued by the so-called constant-sum or closure problem. Particular difficulties have been the lack of a satisfactory, interpretable covariance structure and of rich, tractable, parametric classes of distributions on the simplex sample space. Consideration of logistic and logratio transformations between the simplex and Euclidan space has allowed the introduction of new concepts of covariance structure and of classes of logistic-normal distributions which have now opened up a substantial and meaningful array of statistical methodology for compositional data. From the motivation of a wide variety of practical geological problems we examine the range of possibilities with this new approach to the constant-sum problem.  相似文献   

20.
Measuring Subcompositional Incoherence   总被引:2,自引:0,他引:2  
Subcompositional coherence is a fundamental property of Aitchison’s approach to compositional data analysis and is the principal justification for using ratios of components. We maintain, however, that lack of subcompositional coherence (i.e., incoherence) can be measured in an attempt to evaluate whether any given technique is close enough, for all practical purposes, to being subcompositionally coherent. This opens up the field to alternative methods that might be better suited to cope with problems such as data zeros and outliers while being only slightly incoherent. The measure that we propose is based on the distance measure between components. We show that the two-part subcompositions, which appear to be the most sensitive to subcompositional incoherence, can be used to establish a distance matrix that can be directly compared with the pairwise distances in the full composition. The closeness of these two matrices can be quantified using a stress measure that is common in multidimensional scaling, providing a measure of subcompositional incoherence. The approach is illustrated using power-transformed correspondence analysis, which has already been shown to converge to log-ratio analysis as the power transform tends to zero.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号