首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于ELMO-CNN-BiLSTM-CRF模型的地质实体识别
引用本文:储德平,万波,李红,方芳,王润.基于ELMO-CNN-BiLSTM-CRF模型的地质实体识别[J].地球科学,2021,46(8):3039-3048.
作者姓名:储德平  万波  李红  方芳  王润
作者单位:1.中国地质大学地理与信息工程学院, 湖北武汉 430078
基金项目:国家重点研发计划项目2016YFB0502300中国地质调查局项目12120114074001
摘    要:地质实体是地质文本中的关键和核心信息,对其准确识别是地质信息提取和挖掘的重要前提.设计了ELMO-CNN-BiLSTM-CRF模型,基于预训练字向量构建深层BiLSTM-CRF神经网络模型,通过添加词语动态特征以及词语字符级别的特征,弥补字向量特异性缺失的问题,提高对于地质文本中复杂多词义的识别水平和对地质实体局部特征的提取能力.以《西藏自治区谢通门县雄村铜矿勘探地质报告》为例,对该模型的性能进行了评估,模型的准确率、召回率和F1值分别为95.15%、95.26%和95.21%.实验表明相比BiLSTM-CRF和CNN-BiLSTM-CRF模型,该模型在小规模语料地质实体识别方面效果更优,且能够有效识别长地质实体词汇和地质多义词. 

关 键 词:地质大数据    地质实体    命名实体识别    ELMO-CNN-BiLSTM-CRF    地质文本    数学地质
收稿时间:2020-09-17

Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model
Chu Deping,Wan Bo,Li Hong,Fang Fang,Wang Run.Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model[J].Earth Science-Journal of China University of Geosciences,2021,46(8):3039-3048.
Authors:Chu Deping  Wan Bo  Li Hong  Fang Fang  Wang Run
Abstract:Geological entity is the key and core information in geological texts, and its accurate recognition is an important prerequisite for geological information extraction and mining. The ELMO-CNN-BiLSTM-CRF model is designed in this paper. Based on the pre-trained word vector, the deep BiLSTM-CRF neural network model is constructed. By adding dynamic features of words and character-level features of words, it makes up for the lack of specificity of word vectors, improves the recognition level of complex multi-word meanings in geological text and the ability to extract local features of geological entities. Taking the geological survey report of Xiongcun copper mine in Xietongmen County of Xizang Autonomous Region as an example, the performance of the model is evaluated. The accuracy rate, recall rate and F1 value of the model are 95.15%, 95.26% and 95.21% respectively. Experiments show that compared with BiLSTM-CRF and CNN-BiLSTM-CRF models, this model is more effective in small-scale corpus geological entity recognition, and can effectively identify long geological entity words and geological polysemants. 
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《地球科学》浏览原始摘要信息
点击此处可从《地球科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号