首页 | 本学科首页   官方微博 | 高级检索  
     检索      

中文文本蕴含气象灾害事件信息多模型融合抽取方法
引用本文:胡段牧,袁武,牛方曲,袁文,韩嫒嫒.中文文本蕴含气象灾害事件信息多模型融合抽取方法[J].地球信息科学,2022,24(12):2342-2355.
作者姓名:胡段牧  袁武  牛方曲  袁文  韩嫒嫒
作者单位:1.中国科学院地理科学与资源研究所资源与环境信息系统国家重点实验室,北京 1001012.中国科学院大学,北京 1000493.北京理工大学计算机学院,北京 1000814.中国科学院地理科学与资源研究所区域可持续发展分析与模拟重点实验室,北京 100101
基金项目:中国科学院战略性先导科技专项(XDA23100103)
摘    要:随着气候变暖加剧,全球极端天气事件频发,重大气象灾害的发生频率与日俱增。研究气候变化与气象灾害发生频率的关系,对于气候变化背景下的防灾减灾具有重要意义。文献资料及泛在网络数据中蕴含了海量的气象灾害时空事件,为此,本文基于自然语言处理技术研发了文本气象灾害时空事件自动抽取方法。① 提出了基于专业文献的由粗到精的气象灾害标注语料训练库构建方法。首先针对不同文献资料存在的歧义和不兼容等问题,构建了面向文本事件统一的气象灾害知识体系。然后构建了基于章节结构的粗标注方法,分别针对长文本(现代文)和短文本(文言文)研发了基于Labeled LDA模型及基于TF-IDF和N-gram模型的精细标注语料筛选方法,解决了语料库的快速构建问题;② 基于BERT-CNN模型研发了融合上下文语义特征和多粒度的局部语义特征的、面向长短文本一体化处理的气象灾害时空事件自动分类方法;③ 利用该方法分别从文言文和泛在网络数据中自动抽取了灾害时空事件,其宏F1值分别达到89.09%和80.06%,主要气象灾害时空事件分布与专业统计数据相关性较高;④ 基于以上结果,重建了我国各历史时期灾害时空演变过程,发现各时期灾害数据量整体呈现出逐步上升趋势,暴雨灾害、洪涝灾害与干旱灾害是影响我国的主要灾种。本方法既可实现网络长文本事件的自动发现,也可实现文言文短文本事件的自动检测,为文本数据便捷应用于气象灾害研究和监测提供了新的技术方法。

关 键 词:气象灾害  时空事件  知识体系  语料库  文本分类  BERT-CNN模型  事件抽取  
收稿时间:2022-03-02

Multi-model Fusion Extraction Method for Chinese Text Implicative Meteorological Disasters Event Information
HU Duanmu,YUAN Wu,NIU Fangqu,YUAN Wen,HAN Aiai.Multi-model Fusion Extraction Method for Chinese Text Implicative Meteorological Disasters Event Information[J].Geo-information Science,2022,24(12):2342-2355.
Authors:HU Duanmu  YUAN Wu  NIU Fangqu  YUAN Wen  HAN Aiai
Abstract:With global warming, the frequency of extreme weather events and major meteorological disasters is increasing globally. It is important to study the relationship between climate change and the frequency of meteorological disasters for disaster prevention and mitigation in the context of climate change. In this paper, a method is proposed for automatic extraction of spatial and temporal events of meteorological disasters based on natural language processing technology. Because there is a huge amount of spatial and temporal information of meteorological disasters available in literature and web data. Specifically, (1) A coarse-to-fine method was proposed to build a training corpus of meteorological disaster annotations based on professional literature. Firstly, a unified meteorological disaster knowledge system oriented to textual events is constructed to address the problems of ambiguity and incompatibility of different literature materials. Then a coarse annotation method based on chapter structure was constructed, and a Labeled LDA model-based and a fine-grained annotated corpus screening method based on TF-IDF and N-gram models were developed for long texts (modern texts) and short texts (literary texts), respectively, solving the problem of rapid corpus construction; (2) A method for automatic classification of spatiotemporal events of meteorological disasters based on the BERT-CNN model, which integrates contextual semantic features and local semantic features at multiple granularities, was developed for the integrated processing of short and long texts; (3) Using this method, the spatiotemporal events of meteorological disasters were automatically extracted from the textual and web data, and their macro F1 values reached 89.09% and 80.06%, respectively. The spatiotemporal distributions of major events of meteorological disasters were highly correlated with professional statistics; (4) Based on the above results, the spatiotemporal evolution of disasters in various historical periods in China was also reconstructed. We found that the overall volume of disaster data in each period showed a gradual increasing trend, with heavy rainfall disasters, floods, and droughts being the main types of disasters in China. Our method enables both the automatic extraction of long text events from the web and the automatic detection of short text events from literatures, providing a new technique for application of text data to meteorological disaster research and monitoring.
Keywords:meteorological disasters  spatial and temporal events  knowledge systems  corpora  text classification  BERT-CNN models  event extraction  
点击此处可从《地球信息科学》浏览原始摘要信息
点击此处可从《地球信息科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号