首页 | 本学科首页   官方微博 | 高级检索  
     检索      

结构方程模型及其在地学数据建模中的回顾与展望
引用本文:刘江涛,赵洁,吴发富.结构方程模型及其在地学数据建模中的回顾与展望[J].地质力学学报,2021,27(3):350-364.
作者姓名:刘江涛  赵洁  吴发富
作者单位:中国地质调查局武汉地质调查中心, 湖北 武汉 430205;中国地质大学(北京) 地球科学与资源学院, 北京 100083
基金项目:中国地质调查局地质调查项目(DD20201153,DD20190443);中国商务部对外援助项目(201426)
摘    要:结构方程模型是一种建立、估计和检验因果关系的方法。它可以替代多重回归、路径分析、因子分析、协方差分析等方法,清晰分析单项指标对总体的作用和单项指标间的相互关系,是一种主要应用于验证性模型分析的多元统计建模技术。由于能够通过可观测变量来度量潜变量得分以及分析不同子模型下潜变量之间的协同效应等优点,结构方程模型被广泛应用在心理学、行为学、市场学等领域的数据建模分析研究中,提供了从提出概念—设计模型—获取数据—验证模型的成熟应用路径。地学数据的建模技术一直是地学研究的热点之一,其目的是在海量、多元、高维、多时态的地学数据中,提取出有价值的模型结构以及潜变量,研究不同地学变量以及潜变量之间的交互关系,从而支撑环境治理、灾害防治、资源勘察、生态评价等相关应用和研究。随着地学数据规模变化和建模工具的不断发展,地学数据建模的样本逐渐从抽样建模变为全样本建模,建模方式从有地学模型指导下的建模变为无约束/弱约束建模,建模依据从基于变量因果关系建模变为基于变量相关性的建模,模型复杂度从单模型/单过程建模变为多模型/多过程的综合建模。结构方程模型作为一种综合的建模方法,其可以同时包含因子分析、潜变量估计、路径分析等多种多元分析技术,这种多层次、多分支的建模方法融合了知识驱动建模和数据驱动建模的特点。结构方程模型在地学数据建模中主要面临以下三个方面的挑战,一是从主要面向验证性建模分析的方式向探索性建模分析的方式转变;二是从有完整地学模型约束的建模型方式向弱模型/无模型约束的地学数据建模方式转变;三是从无空间属性的统计变量建模向空间统计变量建模的转变。这对模型本身和数据建模的方法都提出了新的要求。针对以上三个问题,文章在回顾结构方程模型的概念和发展历程的基础上,介绍了三个结构方程模型在地学数据建模中的应用案例,一是利用湖泊沉积物地球化学数据在弱约束条件下提取地球化学金矿内生控矿因子的建模案例;二是利用结构方程模型的综合参数优化方法,通过计算后验概率与观察后验概率的匹配约束来弱化、校正证据权模型中证据独立性问题在计算金矿找矿后验概率中的影响;三是利用结构方程模型来研究墨西哥马格达莱纳流域森林保护策略,通过对不同区域的森林区块进行编号,将空间分布数据转变为传统的无空间属性的统计变量,并分析了不同环境策略对森林保护的影响。 

关 键 词:结构方程模型  地学数据  建模与分析  大数据分析
收稿时间:2021/1/20 0:00:00
修稿时间:2021/3/20 0:00:00

Review and prospect of structural equation modeling in geoscience data modeling and analysis
LIU Jiangtao,ZHAO Jie,WU Fafu.Review and prospect of structural equation modeling in geoscience data modeling and analysis[J].Journal of Geomechanics,2021,27(3):350-364.
Authors:LIU Jiangtao  ZHAO Jie  WU Fafu
Institution:1.Wuhan center of Geological survey, China Geological Survey, Wuhan 430205, Hubei, China2.School of Earth Sciences and Resources, China University of Geoscience(Beijing), Beijing 100083, China
Abstract:Structural equation modeling (SEM) is a method of establishing, estimating and testing causality. It can replace multiple regression, path analysis, factor analysis, covariance analysis and other methods to clearly analyze the effect of individual indicators on the overall and the relationship between individual indicators. SEM is a multivariate statistical modeling technology mainly applied to confirmatory factor analysis model. Due to the advantages of measuring latent variable scores through observable variables and analyzing the synergistic effects between latent variables using different sub-models, SEM is widely used in data modeling and analysis in the fields of psychology, behavior, and marketing. It provides a mature application path of proposing the concept-designing the model-obtaining data-verifying the model. Geoscience data modeling technology has always been one of the hotspots in geoscience research, the purpose of which is to extract valuable model structures and latent variables from massive, multi-dimensional, high-dimensional, and multi-temporal geo-data, and to study different geo-variables and interactive relationship between latent variables so as to support related applications and research such as environmental governance, disaster prevention, resource prospecting, and ecological evaluation. With the changes in the scale of geoscience data and the continuous development of modeling tools, the geoscience data modeling have gradually changed from sampling to full-sample, the method from under the guidance of geological models to unconstrained/weak-constrained modeling, the basis from variable causality to variable correlation, and the complexity from single model/single process to comprehensive multi-model/multi-process. SEM is a comprehensive modeling method, which can include multiple analysis techniques such as factor analysis, latent variable estimation, path analysis, etc. This multi-level, multi-branch modeling method combines the characteristics of knowledge-driven modeling and data-driven modeling. SEM generally faces the following three challenges, also three changes, in the modeling of geoscience data:from a method mainly oriented to confirmatory modeling and analysis to an exploratory modeling and analysis method; from a construction with complete geological model constraints to a weak model/unconstrained geological data modeling method; from a modeling of statistical variables without spatial attributes to a modeling of spatial statistical variables. This puts forward new requirements on the model itself and the method of data modeling. In response to the above three issues, this article reviews the concept and development of SEM, and introduces three application cases of SEM in geological data modeling.One is using lake sediment geochemical data to extract mineralization endogenous factors in gold mines which is modeled under weak constraints.The second is using the comprehensive parameter optimization method of SEM to weaken and correct CI problem of weight of evidence in the calculation of the posterior probability of gold prospecting by matching the posterior probability and the observation posterior probability. The third is using SEM to study the forest protection strategy of the Magdalena watershed in Mexico.By numbering the forest blocks in different regions, the spatial distribution of the data is transformed into traditional statistical variables without spatial attributes, and the impact of different environmental strategies on forest protection is analyzed.
Keywords:structural equation modeling (SEM)  geoscience data  modeling and analysis  big data analysis
本文献已被 万方数据 等数据库收录!
点击此处可从《地质力学学报》浏览原始摘要信息
点击此处可从《地质力学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号