首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Outlier Detection for Compositional Data Using Robust Methods   总被引:6,自引:2,他引:4  
Outlier detection based on the Mahalanobis distance (MD) requires an appropriate transformation in case of compositional data. For the family of logratio transformations (additive, centered and isometric logratio transformation) it is shown that the MDs based on classical estimates are invariant to these transformations, and that the MDs based on affine equivariant estimators of location and covariance are the same for additive and isometric logratio transformation. Moreover, for 3-dimensional compositions the data structure can be visualized by contour lines. In higher dimension the MDs of closed and opened data give an impression of the multivariate data behavior.  相似文献   

2.
A variety of approaches to the testing of distributional forms for compositional data has appeared in the literature, all based on logratio or Box–Cox transformation techniques and to a degree dependent on the divisor chosen in the formation of ratios for these transformations. This paper, recognizing the special algebraic–geometric structure of the standard simplex sample space for compositional problems, the use of the fundamental simplicial singular value decomposition, and an associated power-perturbation characterization of compositional variability, attempts to provide a definitive approach to such distributional testing problems. Our main consideration is the characterization and testing of additive logistic–normal form, but we also indicate possible applications to logistic skew normal forms once a full range of multivariate tests emerges. The testing strategy is illustrated with both simulated data and applications to some real geological compositional data sets.  相似文献   

3.
Geologists may want to classify compositional data and express the classification as a map. Regionalized classification is a tool that can be used for this purpose, but it incorporates discriminant analysis, which requires the computation and inversion of a covariance matrix. Covariance matrices of compositional data always will be singular (noninvertible) because of the unit-sum constraint. Fortunately, discriminant analyses can be calculated using a pseudo-inverse of the singular covariance matrix; this is done automatically by some statistical packages such as SAS. Granulometric data from the Darss Sill region of the Baltic Sea is used to explore how the pseudo-inversion procedure influences discriminant analysis results, comparing the algorithm used by SAS to the more conventional Moore–Penrose algorithm. Logratio transforms have been recommended to overcome problems associated with analysis of compositional data, including singularity. A regionalized classification of the Darss Sill data after logratio transformation is different only slightly from one based on raw granulometric data, suggesting that closure problems do not influence severely regionalized classification of compositional data.  相似文献   

4.
Spurious Clusters in Granulometric Data Caused by Logratio Transformation   总被引:1,自引:0,他引:1  
The logratio transformation aims to eliminate spurious correlations between components of compositional data. When applying this method to granulometric data, there arise numerical problems with zero (empty) components. In this paper, the method of logratio transformation with zero replacement is examined using one natural and two simulated granulometric data sets. The results show that this method generates spurious clusters, and thus it is not appropriate for the investigation of grain-size data in particular, and of compositional data with zero components in general.  相似文献   

5.
Isometric Logratio Transformations for Compositional Data Analysis   总被引:37,自引:0,他引:37  
Geometry in the simplex has been developed in the last 15 years mainly based on the contributions due to J. Aitchison. The main goal was to develop analytical tools for the statistical analysis of compositional data. Our present aim is to get a further insight into some aspects of this geometry in order to clarify the way for more complex statistical approaches. This is done by way of orthonormal bases, which allow for a straightforward handling of geometric elements in the simplex. The transformation into real coordinates preserves all metric properties and is thus called isometric logratio transformation (ilr). An important result is the decomposition of the simplex, as a vector space, into orthogonal subspaces associated with nonoverlapping subcompositions. This gives the key to join compositions with different parts into a single composition by using a balancing element. The relationship between ilr transformations and the centered-logratio (clr) and additive-logratio (alr) transformations is also studied. Exponential growth or decay of mass is used to illustrate compositional linear processes, parallelism and orthogonality in the simplex.  相似文献   

6.
Like compositions in general, regionalized compositions present the problem of spurious spatial correlation. To avoid this problem, this paper uses the additive-logratio transformation of regionalized compositions, following techniques introduced over the last few years for the statistical analysis of compositional data. It leads to an appropriate definition of a spatial covariance structure to describe spatial dependence between regionalized variables subject to constant-sum constraints in the case of weak stationarity. To illustrate stated problems, simulated data are used.  相似文献   

7.
The reflectance of vitrinite (collotelinite) particles is a widely used parameter as a geothermometer for the estimation of the thermal maturity of organic matter enclosed in rocks. However, several problems have occurred during the last decades, which can be traced back to basically three causes: human mistakes, technical problems, and problems associated with the structural and compositional inhomogeneity of organic matter. Whilst in most cases the first two types of uncertainties can be handled by standardization, the third can cause significant problems during interpretation due to its generally inestimable character. The suppression of vitrinite reflectance and statistical problems originated from small sample size, and outliers belong to this latter type.International standards, such as the ASTM and the ISO, define the vitrinite reflectance parameter as a statistical average of measured data, disregarding the fact that the average is an unresisting and unrobust statistical parameter. In other words, the average is very sensitive to outliers and distribution.The aim of this research was to find and test a better, more resistant, and robust statistical parameter used by traditional parametric and nonparametric statistics, which can be applied in practice instead of the average. Three categories of statistical problems were studied on coal and disperse organic matter (DOM) samples: the distribution of measured values, the effect of data number, and the effect of outliers on statistical parameters. The statistical experiments carried out on numerous original and generated sample sets show that the median (med) and the most frequent value (Mn), a special weighted average, are better parameters to estimate the thermal maturity of organic matter especially above 1% reflectance value.  相似文献   

8.
Out-of-equilibrium crystallization often produces complex compositional variability in minerals, generating zoning and other mixing phenomena. The appropriate microchemical characterization of the resulting out-of-equilibrium patterns is of critical importance in understanding the overall physical and chemical properties of the host crystalline phases. In this framework, the modeling of compositional changes assumes a fundamental role. However, when compositional data are used, their management with standard exploratory, statistical, graphical, and numerical tools may give misleading results attributable to the phenomenon of induced correlations. To avoid these problems, methods able to extract compositional data from their constrained space (the simplex) in order to apply standard statistics, have to be adopted. As an alternative, the use of tools having properties able to work in the simplex geometry has to be considered. A luzonite single crystal (ideal composition, Cu3AsS4) exhibiting concentric and sector zoning was studied using electron probe microanalysis in order to understand the mechanisms which give rise to chemical variability and conditions in the developing environment. Compositional variations were determined by collecting data along three different transects. The major and minor elements (Cu, As, S, Fe, Sb, Sn) were analyzed with the aim of characterizing their patterns of association in the crystal and, hence, crystal evolution. The whole covariance structure as well as the chemical relationships between the successive zones was investigated by means of compositional methods, considering both data transformation and the stay in the simplex approach. Results indicate that the crystal grew under quiescent conditions, where chemical control was primarily exercised by the mineral’s surface and only minor effects were due to changes in the composition of the surrounding fluid. Consequently, an oscillatory uptake of chemical components occurred in which a competition between famatinite-like (Cu3SbS4) and kuramite-like (Cu3SnS4) domains characterized the As-poor zones.  相似文献   

9.
Correlation Analysis for Compositional Data   总被引:1,自引:0,他引:1  
Compositional data need a special treatment prior to correlation analysis. In this paper we argue why standard transformations for compositional data are not suitable for computing correlations, and why the use of raw or log-transformed data is neither meaningful. As a solution, a procedure based on balances is outlined, leading to sensible correlation measures. The construction of the balances is demonstrated using a real data example from geochemistry. It is shown that the considered correlation measures are invariant with respect to the choice of the binary partitions forming the balances. Robust counterparts to the classical, non-robust correlation measures are introduced and applied. By using appropriate graphical representations, it is shown how the resulting correlation coefficients can be interpreted.  相似文献   

10.
Data of a microfossil group, the planktonic foraminifera, have been tested to determine the conformity of various real data distributions to univariate and multivariate normality and the effects that standard transformations have upon the distributions. Studies of two bivariate samples, one trivariate sample, and two quadrivariate samples of size data indicate that distributions frequently deviate greatly from multivariate normality. Univariate distributions are generally positively skewed and show a tendency for leptokurtosis. A logarithmic transformation improved both univariate and multivariate distributions but the number of distributions conformable to normality increased only slightly—from zero to one in the multivariate case and from one to four in the univariate case (totally 15 distributions). Arcsine (p/100) 1/2 transformations of percentage data in two samples including 16 and 23 species, respectively, decreased highly significant deviations from multivariate normality but distributions remained greatly non-normal. Although markedly positively skewed and leptokurtic univariate distributions were improved in most instances, the number of normal distributions (two) did not change. It follows that neither of the transformations caused significant increases in the number of normal distributions but if it is assumed that the consequences of non-normality are less severe as the deviation from normality decreases, the transformations are justified.  相似文献   

11.
Measuring Subcompositional Incoherence   总被引:2,自引:0,他引:2  
Subcompositional coherence is a fundamental property of Aitchison’s approach to compositional data analysis and is the principal justification for using ratios of components. We maintain, however, that lack of subcompositional coherence (i.e., incoherence) can be measured in an attempt to evaluate whether any given technique is close enough, for all practical purposes, to being subcompositionally coherent. This opens up the field to alternative methods that might be better suited to cope with problems such as data zeros and outliers while being only slightly incoherent. The measure that we propose is based on the distance measure between components. We show that the two-part subcompositions, which appear to be the most sensitive to subcompositional incoherence, can be used to establish a distance matrix that can be directly compared with the pairwise distances in the full composition. The closeness of these two matrices can be quantified using a stress measure that is common in multidimensional scaling, providing a measure of subcompositional incoherence. The approach is illustrated using power-transformed correspondence analysis, which has already been shown to converge to log-ratio analysis as the power transform tends to zero.  相似文献   

12.
Variograms for gold and lead values from the Loraine and Prieska mines, respectively, indicate that data outliers can seriously distort and/or mask the real variogram patterns. Studies show that this problem is best overcome for these mines by logarithmic transformation of the data, and/or a suitable screening out of such outliers, and/or more robust variogram estimation procedures; the benefits are particularly significant when the basic data is limited.  相似文献   

13.
14.
Genetic algorithms can solve least-squares problems where local minima may trap more traditional methods. Although genetic algorithms are applicable to compositional as well as noncompositional data, the standard implementation treats compositional data awkwardly. A need to decode, renormalize, then reincode the fitted parameters to regain a composition is not only computationally costly, but may thwart convergence. A modification to the genetic algorithm, described here, adapts the tools of reproduction, crossover, and mutation to compositional data. The modification consists of replacing crossover with a linear mixture of two parents and replacing mutation with a linear mixture of one of the members of the breeding population and a randomly generated individual. By using continuously evolving populations, rather than discrete generations, reproduction is no longer required. As a test of this new approach, a mixture of four Gaussian functions with given means and variances are deconvolved to recover their mixing proportions.  相似文献   

15.
Commonly, geological studies compare mean values of two or more compositional data suites in order to determine if, how, and by how much they differ. Simple approaches for evaluating and statistically testing differences in mean values for open data fail for compositional (closed) data. A new parameter, an f-value, therefore has been developed, which correctly quantifies the differences among compositional mean values and allows testing those differences for statistical significance. In general, this parameter quantifies only therelative factor by which compositional variables differ across data suites; however for situations where, arguably, at least one component has neither increased nor decreased, anabsolute f-value can be computed. In situations where the compositional variables have undergone many perturbations, arguments based upon thef-values and the central limit theorem indicate that logratios of compositional variables should be normally distributed.  相似文献   

16.
The analysis and interpretation of compositional data, such as major oxide compositions of rocks, has been traditionally plagued by the so-called constant-sum or closure problem. Particular difficulties have been the lack of a satisfactory, interpretable covariance structure and of rich, tractable, parametric classes of distributions on the simplex sample space. Consideration of logistic and logratio transformations between the simplex and Euclidan space has allowed the introduction of new concepts of covariance structure and of classes of logistic-normal distributions which have now opened up a substantial and meaningful array of statistical methodology for compositional data. From the motivation of a wide variety of practical geological problems we examine the range of possibilities with this new approach to the constant-sum problem.  相似文献   

17.
Hydraulic exponents and unit hydraulic exponents are unit-sum constrained, which requires that they be analyzed by statistical methods designed for compositional data. Though uncertainties remain regarding selection of the best constraining operation and method of handling departures from the unit-sum constraint, neither category of uncertainty should be an impediment to the selection of the appropriate statistical methodology. In a small sample study, the hydraulic geometry of different types of streams were compared: (1) semi-arid: perennial vs. ephemeral; (2) tropical: Puerto Rico vs. West Malaysia; and (3) semi-arid vs. tropical (by pooling the previous data sets). All three comparisons revealed statistically significant differences in either logratio mean vectorsor logratio covariance matrices but not both. All six categories of data had logistic normal distributions. Because the derivatives at a given discharge of curvilinear hydraulic geometry relationships and hydraulic exponents on either side of the breakpoints of piecewise linear relationships are unit-sum constrained, they also can be studied by compositional methods. However, the compositional approach is limited in cases where distributions have large departures from logistic normality and for streams that have negative hydraulic exponents.  相似文献   

18.
    
Geological data frequently have a heavy-tailed normal-in-the-middle distribution, which gives rise to grade distributions that appear to be normal except for the occurrence of a few outliers. This same situation also applies to log-transformed data to which lognormal kriging is to be applied. For such data, linear kriging is nonrobust in that (1)kriged estimates tend to infinity as the outliers do, and (2)it is also not minimum mean squared error. The more general nonlinear method of disjunctive kriging is even more nonrobust, computationally more laborious, and in the end need not produce better practical answers. We propose a robust kriging method for such nearly normal data based on linear kriging of an editing of the data. It is little more laborious than conventional linear kriging and, used in conjunction with a robust estimator of the variogram, provides good protection against the effects of data outliers. The method is also applicable to time series analysis.  相似文献   

19.
Estimation of regionalized compositions: A comparison of three methods   总被引:1,自引:0,他引:1  
A regionalized composition is a random vector function whose components are positive and sum to a constant at every point of the sampling region. Consequently, the components of a regionalized composition are necessarily spatially correlated. This spatial dependence—induced by the constant sum constraint—is a spurious spatial correlation and may lead to misinterpretations of statistical analyses. Furthermore, the cross-covariance matrices of the regionalized composition are singular, as is the coefficient matrix of the cokriging system of equations. Three methods of performing estimation or prediction of a regionalized composition at unsampled points are discussed: (1) the direct approach of estimating each variable separately; (2) the basis method, which is applicable only when a random function is available that can he regarded as the size of the regionalized composition under study; (3) the logratio approach, using the additive-log-ratio transformation proposed by J. Aitchison, which allows statistical analysis of compositional data. We present a brief theoretical review of these three methods and compare them using compositional data from the Lyons West Oil Field in Kansas (USA). It is shown that, although there are no important numerical differences, the direct approach leads to invalid results, whereas the basis method and the additive-log-ratio approach are comparable.  相似文献   

20.
A Parametric Approach for Dealing with Compositional Rounded Zeros   总被引:2,自引:0,他引:2  
In this work, a parametric approach for replacing data below the detection limit, also known as rounded zeros, in compositional data sets is proposed. Compositional rounded zeros correspond to small proportions of some whole that cannot be reliably detected by the analytical instruments under given operating conditions. This kind of zeros appear frequently in the data collection process in geosciences. They must be treated in an adequate way before some multivariate analysis can be applied. Our procedure results from a modification of the Expectation-Maximization (EM) algorithm and is based on the additive log-ratio transformation. Its coherence with the nature of compositional data and with basic operations in the simplex sample space is checked. Using real data sets, we find that this approach improves other parametric and non-parametric techniques for compositional rounded zeros.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号