首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于语义知识的空间关系识别研究
引用本文:袁烨城,;刘海江,;裴韬,;高锡章.基于语义知识的空间关系识别研究[J].地球信息科学,2014(5):681-690.
作者姓名:袁烨城  ;刘海江  ;裴韬  ;高锡章
作者单位:[1]中国科学院地理科学与资源研究所资源与环境信息系统国家重点实验室,北京100101; [2]中国环境监测总站,北京100012
基金项目:国家“863”项目(2012AA12A403).
摘    要:从自然语言文本(新闻报道、博客、论坛、社交网络等)中识别空间关系是大数据时代获取空间信息的重要手段之一。针对现有方法只考虑字词特征,识别过程容易产生匹配歧义的局限,本文提出了一种新的融入词法、句法等语义知识的空间关系识别方法。本方法设计了一个树形结构的抽取模式:树结点代表空间词汇类型,结点之间的关系代表词汇间的依存关系。其中,抽取模式可从标注语料中自主学习得到。模式匹配过程以空间词汇类型和句法依存关系作为硬性约束条件、以词汇语义相似度作为软性约束条件,将模式从树形结构转换成依存序列后,根据有限自动机原理实现匹配。实验结果表明,本方法的识别精度和召回率分别为86.67%和63.11%,与现有其他基于规则的方法相比,有2个优点:(1)模式学习过程无需人工干预;(2)融入了句法依存关系,可消除匹配歧义,提高了识别准确率。

关 键 词:空间关系识别  自动机  空间词汇  依存关系  语义知识

Spatial Relation Extraction from Chinese Characterized Documents Based on Semantic Knowledge
Institution:YUAN Yecheng, LIU Haijiang, PEI Tao1 and GAO Xizhan (1. State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Science and Natural Resources Research, CAS, Beijing 100101, China; 2. China National Envirpnment Monitoring Center, Beijing 100012, China)
Abstract:Extracting spatial relation from text documents in natural languages (news, journal, blog, social network etc.) is an important method of obtaining spatial information in the era of big data. Former methods of extracting spatial relation from Chinese characterized text only focused on the features of Chinese characters and phrases, which easily cause ambiguous matching. This paper presented a new rule-based method that integrates lexical, syntactic and semantic knowledge. The extracting rule in this method was composed of spatial words and syntactic dependences between these words, which jointly formed a tree structure. The tree nodes represent thespatial words and they were connected by syntactic dependences. Spatial words were the words that can be used to express spatial relations, which were subsequently classified into 6 categories: geographical entities, preposition, locative nouns, spatial predicate, metaphorical spatial nouns and assistant words. In the process of rule matching, finite automata was used to identify new spatial relation instances that satisfy the following two conditions: (1) same syntactic dependence structure with regard to the extracting rules; (2) similarity of the spatial words. The part-of-speech, semantic similarity were used to measure the consistency between spatial words. The experiment of extracting the direction relations from Encyclopedia of China shows that the accuracy and the recall rate of this method achieve 86.67% and 63.11% respectively, which is better than the former methods. Comparing with the former methods, the improvements of this method include: (1) the process of extracting rule generation does not require human intervention; (2) the ambiguous matching can be diminished by integrating syntactic dependence knowledge, which evidently promoted the performance of spatial relation identification.
Keywords:spatial relation extraction  finite automata  spatial word  syntactic dependence  semantic knowledge
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《地球信息科学》浏览原始摘要信息
点击此处可从《地球信息科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号