期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Sampling Bias and Class Imbalance in Maximum-likelihood Logistic Regression

Thomas?Oommen Email author Laurie?G.?Baise Richard?M.?Vogel 《Mathematical Geosciences》2011,43(1):99-120

Logistic regression is a widely used statistical method to relate a binary response variable to a set of explanatory variables and maximum likelihood is the most commonly used method for parameter estimation. A maximum-likelihood logistic regression (MLLR) model predicts the probability of the event from binary data defining the event. Currently, MLLR models are used in a myriad of fields including geosciences, natural hazard evaluation, medical diagnosis, homeland security, finance, and many others. In such applications, the empirical sample data often exhibit class imbalance, where one class is represented by a large number of events while the other is represented by only a few. In addition, the data also exhibit sampling bias, which occurs when there is a difference between the class distribution in the sample compared to the actual class distribution in the population. Previous studies have evaluated how class imbalance and sampling bias affect the predictive capability of asymptotic classification algorithms such as MLLR, yet no definitive conclusions have been reached. 相似文献

2.

Logistic regression model for predicting the failure probability of a landslide dam 总被引：1，自引：0，他引：1

Jia-Jyun Dong Yu-Hsiang Tung Chien-Chih Chen Jyh-Jong Liao Yii-Wen Pan 《Engineering Geology》2011,117(1-2):52-61

Landslides may obstruct river flow and result in landslide dams; they occur in many regions of the world. The formation and disappearance of natural lakes involve a complex earth–surface process. According to the lessons learned from many historical cases, landslide dams usually break down rapidly soon after the formation of the lake. Regarding hazard mitigation, prompt evaluation of the stability of the landslide dam is crucial. Based on a Japanese dataset, this study utilized the logistic regression method and the jack-knife technique to identify the important geomorphic variables, including peak flow (or catchment area), dam height, width and length in sequence, affecting the stability of landslide dams. The resulting high overall prediction power demonstrates the robustness of the proposed logistic regression models. Accordingly, the failure probability of a landslide dam can also be evaluated based on this approach. Ten landslide dams (formed after the 1999 Chi-Chi Earthquake, the 2008 Wenchuan Earthquake and 2009 Typhoon Morakot) with complete dam geometry records were adopted as examples of evaluating the failure probability. The stable Tsao-Ling landslide dam, which was induced by the Chi-Chi earthquake, has a failure probability of 27.68% using a model incorporating the catchment area and dam geometry. On the contrary, the Tangjiashan landslide dam, which was artificially breached soon after its formation during the Wenchuan earthquake, has a failure probability as high as 99.54%. Typhoon Morakot induced the Siaolin landslide dam, which was breached within one hour after its formation and has a failure probability of 71.09%. Notably, the failure probability of the earthquake induced cases is reduced if the catchment area in the prediction model is replaced by the peak flow of the dammed stream for these cases. In contrast, the predicted failure probability of the heavy rainfall-induced case increases if the high flow rate of the dammed stream is incorporated into the prediction model. Consequently, it is suggested that the prediction model using the peak flow as causative factor should be used to evaluate the stability of a landslide dam if the peak flow is available. Together with an estimation of the impact of an outburst flood from a landslide-dammed lake, the failure probability of the landslide dam predicted by the proposed logistic regression model could be useful for evaluating the related risk. 相似文献

3.

基于Logistic回归模型的砂土液化概率评价 总被引：2，自引：1，他引：1

潘建平孔宪京邹德高《岩土力学》2008,29(9):2567-2571

以国内外23次地震中200组场地液化实测数据为基础,通过Logistic回归分析,建立关联修正标准贯入击数N160cs与循环应力比CSR的液化概率模型。以50 %液化概率水平为液化与非液化的临界点,建立了指数形式的抗液化应力比CRR计算式,新建概率模型预测饱和砂土液化与非液化的成功率分别为85.71 %和76.14 %,具有较高的可靠性。与已有模型比较,使用了新的数据和修正系数,消除了一些不合理的偏差,总体判别结果偏于安全。为了将确定性分析方法与概率分析方法联系起来,建立了抗液化安全系数FS与液化概率PL的关系式。算例结果表明,新建概率模型简单、实用、可靠。相似文献

4.

基于贝叶斯网络的地震液化概率预测分析

胡记磊唐小微裘江南《岩土力学》2016,37(6):1745-1752

基于解释结构模型和因果图法,选取12个具有代表性的定性和定量因素,在大量数据不完备的情况下提出了建立贝叶斯网络液化模型的方法。以2011年日本东北地区太平洋近海地震液化不完备数据为例,采用总体精度、ROC曲线下面积、准确率、召回率和F1值5项指标对模型进行综合评估,并与径向基神经网络模型进行对比。结果表明：贝叶斯网络液化模型的回判和预测效果都优于径向基神经网络模型,且对于数据缺失的样本的预测效果也较理想。此外,该模型对于不同土质的液化评估均有较好的适用性。分类不均衡和抽样偏差会对模型的学习和预测效果产生很大影响,建议应同时采用上述5项评估指标进行综合评估模型的优劣。相似文献

5.

基于有效降雨强度和逻辑回归的降雨型滑坡预测模型 总被引：3，自引：1，他引：2

下载免费PDF全文

盛逸凡李远耀徐勇吴吉明林巍《水文地质工程地质》2019,(1):156-156

以湖南省张家界市桑植县为研究区,在全面分析近30年降雨及滑坡数据的基础上,对滑坡及滑坡数量与降雨因子的关系开展了统计分析研究。首先确定了区域最佳有效降雨衰减系数,同时分别按滑坡规模、坡度、厚度大小统计了降雨与历史滑坡信息,得出有效降雨强度（I）与持续时间（D）散点图,由此确定各不同概率下诱发滑坡的区域有效降雨强度阈值,并进行了滑坡灾害危险性等级划分。进而,利用部分样本数据进行逻辑回归分析,得到了该研究区的滑坡发生概率预测方程,并给出了降雨强度临界值定量表达式,最后选用实际降雨诱发滑坡事件与未诱发滑坡事件进行对比验证。结果表明,文章所建立的滑坡预测模型准确性较高,预测情况与实际情况比较吻合。相似文献

6.

Comment on “A Comparison of Modified Fuzzy Weights of Evidence,Fuzzy Weights of Evidence,and Logistic Regression for Mapping Mineral Prospectivity” by Daojun Zhang,Frits Agterberg,Qiuming Cheng,and Renguang Zuo Math Geosci DOI 10.1007/s11004-013-9496-8

Helmut Schaeben 《Mathematical Geosciences》2014,46(7):887-893

Despite a missing definition of equivalence of mathematical models or methods by Zhang et al. (Math Geosci, 2013), an “equivalence” (Zhang et al., Math Geosci, 2013, p. 6,7,8,14) of modified weights-of-evidence (Agterberg, Nat Resour Res 20:95–101, 2011) and logistic regression does not generally exist. Its alleged proof is based on a previously conjectured linear relationship between weights of evidence and logistic regression parameters (Deng, Nat Resour Res 18:249–258, 2009), which does not generally exist either (Schaeben and van den Boogaart, Nat Resour Res 20:401–406, 2011). In fact, an extremely simple linear relationship exists only if the predictor variables are conditionally independent given the target variable, in which case the contrasts, i.e., the differences of the weights, are equal to the logistic regression parameters. Thus, weights-of-evidence is the special case of logistic regression if the predictor variables are binary and conditionally independent given the target variable. 相似文献

7.

Evaluation of statistical bias correction methods for numerical weather prediction model forecasts of maximum and minimum temperatures

V. R. Durai Rashmi Bhradwaj 《Natural Hazards》2014,73(3):1229-1254

Statistical bias correction methods for numerical weather prediction (NWP) forecasts of maximum and minimum temperatures over India in the medium-range time scale (up to 5 days) are proposed in this study. The objective of bias correction is to minimize the systematic error of the next forecast using bias from past errors. The need for bias corrections arises from the many sources of systematic errors in NWP modeling systems. NWP models have shortcomings in the physical parameterization of weather events and have the inability to handle sub-grid phenomena successfully. The statistical algorithms used for minimizing the bias of the next forecast are running-mean (RM) bias correction, best easy systematic estimator, simple linear regression and the nearest neighborhood (NN) weighted mean, as they are suitable for small samples. Bias correction is done for four global NWP model maximum and minimum temperature forecasts. The magnitude of the bias at a grid point depends upon geographical location and season. Validation of the bias correction methodology is carried out using daily observed and bias-corrected model maximum and minimum temperature forecast over India during July–September 2011. The bias-corrected NWP model forecast generally outperforms direct model output (DMO). The spatial distribution of mean absolute error and root-mean squared error for bias-corrected forecast over India indicate that both the RM and NN methods produce the best skill among other bias correction methods. The inter-comparison reveals that statistical bias correction methods improve the DMO forecast in terms of accuracy in forecast and have the potential for operational applications. 相似文献

8.

二维场地液化势预测的神经网络方法 总被引：4，自引：1，他引：3

佘跃心《岩土力学》2004,25(10):1569-1574

基于人工神经网络,提出了场地液化势预测模型。场地液化势的空间数据结构特征可由不同参数的自回归神经网络（GRNN）来模拟。该预测模型的一个重要参数spread可用地质统计学（Kriging）方法中的交叉验证技术来确定。研究表明,在最优spread参数条件下GRNN能够较好地映射场地液化势数据结构特征。用GRNN模型预测结果与经典的Kriging估计方法所得到的结果十分吻合。GRNN模型可以用于二维空间数据的预测及基于GIS决策系统。相似文献

9.

Statistical analysis of rock mass fracturing

Gregory B. Baecher 《Mathematical Geosciences》1983,15(2):329-348

相似文献

10.

Logistic models as a forecasting tool for snow avalanches in a cold maritime climate: northern Gaspésie,Québec,Canada

F. Gauthier D. Germain B. Hétu 《Natural Hazards》2017,89(1):201-232

Snow avalanches are a major natural hazard for road users and infrastructure in northern Gaspésie. Over the past 11 years, the occurrence of nearly 500 snow avalanches on the two major roads servicing the area was reported. No management program is currently operational. In this study, we analyze the weather patterns promoting snow avalanche initiation and use logistic regression (LR) to calculate the probability of avalanche occurrence on a daily basis. We then test the best LR models over the 2012–2013 season in an operational forecasting perspective: Each day, the probability of occurrence (0–100%) determined by the model was classified into five classes avalanche danger scale. Our results show that avalanche occurrence along the coast is best predicted by 2 days of accrued snowfall [in water equivalent (WE)], daily rainfall, and wind speed. In the valley, the most significant predictive variables are 3 days of accrued snowfall (WE), daily rainfall, and the preceding 2 days of thermal amplitude. The large scree slopes located along the coast and exposed to strong winds tend to be more reactive to direct snow accumulation than the inner-valley slopes. Therefore, the probability of avalanche occurrence increases rapidly during a snowfall. The slopes located in the valley are less responsive to snow loading. The LR models developed prove to be an efficient tool to forecast days with high levels of snow avalanche activity. Finally, we discuss how road maintenance managers can use this forecasting tool to improve decision making and risk rendering on a daily basis. 相似文献

11.

砂土地震液化预测的GA_SVM_Adaboost模型

毛志勇黄春娟路世昌《煤田地质与勘探》2019,47(3):166-171

为快速准确地对砂土液化情况作出预测,选取地震烈度、地下水位、覆盖厚度、标贯击数、平均粒径、地貌单元、土质及不均匀系数为主要影响因素,运用相关性分析和因子分析模型对其进行分析和属性约减,采用遗传算法（GA）对支持向量机（SVM）的参数寻优,结合Adaboost迭代算法,建立预测砂土地震液化的GA_SVM_Adaboost模型。选用唐山地震砂土液化现场勘察资料中的329组数据对模型进行训练,利用该模型对剩余68组砂土液化数据进行预测。最后,将预测结果与GA_SVM和SVM模型预测结果进行比较。结果表明,3个模型的平均预测准确率分别为100%、98.04%、89.71%,基于因子分析的GA_SVM_Adaboost模型的预测准确性优于GA_SVM模型和SVM模型,是一种解决砂土地震液化预测问题的有效方法,具有一定的应用参考价值。相似文献

12.

考虑地震随机特征的液化侧向变形超越概率

刘芳李震蒋明镜黄雨《岩土力学》2015,36(12):3548-3555

基于液化侧向变形实用统计模型和地震概率模型,建立了可以考虑地震随机特征和土体性质不确定性的液化侧向变形超越概率模型框架,通过实际案例初步探讨了模型的有效性,并将超越概率模型与现有统计模型的预测结果进行了对比。分析结果表明,若液化侧向变形的条件概率满足正态分布,标准差在5%到20%期望值范围内变化时,对位移超越概率影响不大;若满足对数正态分布,标准差对超越概率有一定影响。实用统计模型只能预测指定地震水平下的液化侧向变形值,而超越概率模型考虑了指定时间内所有可能地震的发生概率,可以同时预测变形值及发生概率,更加适合用于区域性的地震液化灾害评估。相似文献

13.

A generalized model for evaluating area-potential in a mineral exploration program

Ramesh K. T. Reddy George S. Koch 《Mathematical Geology》1988,20(3):227-241

A generalized model for predicting the potential of geographic areas for mineral exploration is developed using computers and mathematical techniques. A cellular approach is adopted and each area is divided into cells; the data base is transformed into a computer processable form by digitizing data over each cell. Control cells are selected from a control area by two a priori subjective models using multiple linear regression and filtering techniques. These control cells are used to develop weighting factors for computer-transformed variables. The evaluation and prediction of cells are made using an evaluation model, wherein products of weighting factors and corresponding transformed variables are added to give a probability score for each cell. In the example analysis, 7.76% of cells are selected as predicted cells and checked for mining in the cell areas by comparing them with a mining data base. Of the total predicted cells, 38.13% are classified as first-order prospects and the remaining predicted cells are classified into second- and third-order prospects. The success of the prediction and the open structure of the model implies a successful, generalized model with capabilities of evaluating large areas and predicting the potential in any exploration program.Presented at the Third Decennial International Conference on Geophysical and Geochemical Exploration for Minerals and Groundwater, Sept. 27–Oct 1, 1987, Toronto, Canada. 相似文献

14.

Shaking table tests to investigate the influence of various factors on the liquefaction resistance of sands

Renjitha Mary Varghese G. Madhavi Latha 《Natural Hazards》2014,73(3):1337-1351

This paper presents the shaking table studies to investigate the factors that influence the liquefaction resistance of sand. A uniaxial shaking table with a perspex model container was used for the model tests, and saturated sand beds were prepared using wet pluviation method. The models were subjected to horizontal base shaking, and the variation of pore water pressure was measured. Three series of tests varying the acceleration and frequency of base shaking and density of the soil were carried out on sand beds simulating free field condition. Liquefaction was visualized in some model tests, which was also established through pore water pressure ratios. Effective stress was calculated at the point of pore water pressure measurement, and the number of cycles required to liquefy the sand bed were estimated and matched with visual observations. It was observed that there was a gradual variation in pore water pressure with change in base acceleration at a given frequency of shaking. The variation in pore water pressure is not significant for the range of frequency used in the tests. The frequency of base shaking at which the sand starts to liquefy when the sand bed is subjected to any specific base acceleration depends on the density of sand, and it was observed that the sand does not liquefy at any other frequency less than this. A substantial improvement in liquefaction resistance of the sand was observed with the increase in soil density, inferring that soil densification is a simple technique that can be applied to increase the liquefaction resistance. 相似文献

15.

Robust quantification of parametric uncertainty for surfactant–polymer flooding

Ali Alkhatib Peter King 《Computational Geosciences》2014,18(1):77-101

Uncertainty in surfactant–polymer flooding is an important challenge to the wide-scale implementation of this process. Any successful design of this enhanced oil recovery process will necessitate a good understanding of uncertainty. Thus, it is essential to have the ability to quantify this uncertainty in an efficient manner. Monte Carlo simulation is the traditional uncertainty quantification approach that is used for quantifying parametric uncertainty. However, the convergence of Monte Carlo simulation is relatively low, requiring a large number of realizations to converge. This study proposes the use of the probabilistic collocation method in parametric uncertainty quantification for surfactant–polymer flooding using four synthetic reservoir models. Four sources of uncertainty were considered: the chemical flood residual oil saturation, surfactant and polymer adsorption, and the polymer viscosity multiplier. The output parameter approximated is the recovery factor. The output metrics were the input–output model response relationship, the probability density function, and the first two moments. These were compared with the results obtained from Monte Carlo simulation over a large number of realizations. Two methods for solving for the coefficients of the output parameter polynomial chaos expansion are compared: Gaussian quadrature and linear regression. The linear regression approach used two types of sampling: full-tensor product nodes and Chebyshev-derived nodes. In general, the probabilistic collocation method was applied successfully to quantify the uncertainty in the recovery factor. Applying the method using the Gaussian quadrature produced more accurate results compared with using the linear regression with full-tensor product nodes. Applying the method using the linear regression with Chebyshev derived sampling also performed relatively well. Possible enhancements to improve the performance of the probabilistic collocation method were discussed. These enhancements include improved sparse sampling, approximation order-independent sampling, and using arbitrary random input distribution that could be more representative of reality. 相似文献

16.

贵州省都匀市滑坡易发性评价研究 总被引：6，自引：1，他引：5

下载免费PDF全文

任敬范宣梅赵程周礼窦向阳《水文地质工程地质》2018,(5):165-165

都匀市是贵州省城镇滑坡地质灾害多发频发区。文章以都匀市沙包堡镇为研究区,采用栅格单元提取高程、坡度、岩性、水系等9项致灾因子,分别使用都基于数学统计模型的定量分析方法（二元逻辑回归模型、信息量模型）和定性分析方法（层次分析模型）对都匀市研究区滑坡地质灾害易发性进行评价。结果表明:二元逻辑回归模型预测精度与预测效果均为最优,其ROC曲线下面积AUC值为0.873,易发性分区中高易发区和中易发区内预测发生滑坡面积比占95.41%,且最符合野外实地调查验证情况。评价方法与结果可为贵州城镇地区滑坡地质灾害评价和防治提供借鉴。相似文献

17.

基于多钻进参数和概率分类方法的地层识别研究

梁栋才汤华吴振君张勇慧房昱纬《岩土力学》2022,43(4):1123-1134

传统的超前钻探地质预报常以某个钻进参数的变化率作为地层识别的主要依据。钻头破岩是一个复杂的力学过程,应考虑多个参数的协同作用,仅采用单钻进参数识别地层的不确定性较大。首先,对超前钻探数据进行预处理,包括标准化、频数分布分析和敏感性分析,筛选出对地层变化敏感的关键钻进参数;其次,基于能量守恒、二元无序逻辑回归分析和多参数变异性分析原理分别建立了破岩能量、逻辑回归概率和地层硬度3种地层识别综合指标;最后,采用基于贝叶斯原理的概率分类方法建立地层识别模型,利用ROC分析方法得到模型参数,实现基于多钻进参数和概率分类方法的地层识别。以地质条件复杂的隧道工程为例,介绍了该地层识别方法的应用,结果表明：3种地层识别综合指标均具有较好的跨孔地层识别能力,识别准确率超过80%;破岩能量和逻辑回归概率指标适用于较近距离的跨孔地层识别,平均识别准确率分别为86.3%和84.1%;逻辑回归概率指标对软弱夹层识别能力较强,准确率达到94.2%;地层硬度指标适用于较远距离的跨孔地层识别;灰岩识别准确率最大达到93.2%。相似文献

18.

基于PCA-DDA原理的砂土液化预测模型及应用

宫凤强李嘉维《岩土力学》2016,37(Z1):448-454

影响砂土液化的因素有很多,建立多指标的液化预测模型非常有必要。目前所有的多指标砂土液化预测模型,均默认选取的判别因子之间相互独立,不存在相关性,可能导致各判别因子之间存在信息叠加而发生误判。以唐山地震砂土液化的25个案例为样本,选取8个影响因素作为砂土液化预测的初始判别指标,首先采用主成分分析（PCA）对各判别指标进行分析,对存在相关性比较高的指标进行了降维处理。基于降维后的4个主成分换算得到新的样本数据,以18个案例为学习样本,建立主成分分析与距离判别分析（DDA）相结合的砂土液化预测模型。利用建立的预测模型对18个案例进行回判,结果全部正确。对其他7个案例的液化情况进行了预测,并与规范法、Seed方法、BP法、DDA法的判别结果进行分析比较,结果表明基于主成分分析与距离判别方法的砂土液化判别模型预测准确率为100%。将模型应用于工程实例,判别结果也与实际情况一致,表明该模型具有良好的预测功能,可在实际工程中应用。相似文献

19.

Surge dynamics of disaster displaced populations in temporary urban shelters: future challenges and management issues

Md Shahab Uddin Mokbul Morshed Ahmad Pennung Warnitchai 《Natural Hazards》2018,91(1):201-220

A tropical cyclone (TC) precipitation prediction scheme has been developed based on the physical quantities of the NCEP/NCAR reanalysis data as potential predictors and using fuzzy neural network (FNN) model. TC precipitation samples from 172 tropical cyclones (TCs) affecting Guangxi, China, spanning 1980–2015 are used for model development. The FNN model input is constructed from potential predictors by employing both a stepwise regression method (SRM) and a locally linear embedding (LLE) algorithm. The LLE algorithm is capable of finding meaningful low-dimensional architectures hidden in their nonlinear high-dimensional data space and separating the underlying factors. In this scheme, the newly developed model, which is termed the FNN–LLE model, is used for daily TC precipitation prediction from 20:00 (Beijing Time, or BT) of the previous day to 20:00 BT of the current day at 89 stations covering Guangxi, China. Using identical modeling samples and independent samples, predictions of the FNN–LLE model are compared with the widely used SRM and interpolation method using the fine-mesh data of the European Centre for Medium-Range Weather Forecasts (ECMWF) in terms of the performance of TC rainfall prediction at 89 stations in Guangxi. The root-mean-square error (RMSE), bias, and equitable threat score (ETS) results were employed to assess the predicted outcomes. Results show that the FNN–LLE model is superior to the interpolation method by ECMWF and SRM for TC precipitation prediction with RMSE values of 21.94, 24.07, and 25.22 in FNN–LLE model, interpolation method by ECMWF and SRM, respectively. Moreover, FNN–LLE model having average bias and ETS values close to 1.0 gave better predictions than did the interpolation method by ECMWF and SRM. 相似文献

20.

Probabilistic modelling of joint orientation

Pinnaduwa H. S. W. Kulatilake Tien H. Wu Deepa N. Wathugala 《国际地质力学数值与分析法杂志》1990,14(5):325-350

Observed frequencies of joint orientations are subject to error due to sampling bias. This error should be corrected before statistical inference is made on the distribution of orientation. Corrections (weighting functions) are developed for sampling bias in orientation for finite joints of different sizes and shapes intersecting rectangular exposures. Chi-square goodness-of-fit procedures available for hemispherical normal and bivariate normal distributions are modified to make them applicable for both raw data and data corrected for sampling bias. The aforementioned corrections and procedures were applied to a joint orientation cluster to study the effect of (a) joint orientation, (b) joint size and (c) joint shape, on the statistical distribution of the orientation. The influences of all these aforementioned factors were found to be significant. However, at present, joint sizes and shapes are not measured in field joint surveys. Therefore, it is suggested to make attempts to obtain joint sizes and shapes in field joint mapping surveys. Since the currently available probability distributions are not adequate to represent all joint orientation distributions, it is suggested either to look for new probability distributions or to develop procedures to use empirical distributions in modelling orientation distributions. 相似文献