首页 | 本学科首页   官方微博 | 高级检索  
     检索      

统计决策树下的城市地址集中文分词
引用本文:应申,李威阳,贺彪,王维,万远.统计决策树下的城市地址集中文分词[J].武汉大学学报(信息科学版),2019,44(2):302-309.
作者姓名:应申  李威阳  贺彪  王维  万远
作者单位:1.武汉大学资源与环境科学学院, 湖北 武汉, 430079
基金项目:国家自然科学基金(41671381,41531177);“十三五”国家重点研发计划(2016YFF0201301,2017YFB0503500);国土资源部城市土地资源监测与仿真重点实验室开放基金(KF-2018-03-010)
摘    要:不同于常规的需要依赖城市地址词典或规则库的地址分词模型,提出不依赖地址词典、基于海量地址数据挖掘的分词方法。该方法结合统计规律计算地址要素在地址数据集中的分布特征,挖掘地址数据中分词的后缀点和落差点,根据后缀点和落差点的相对位置关系构建统计决策树提取地址要素;并采用深圳市建筑物地址普查数据进行验证,形成对当前地址地名词典的有益补充。

关 键 词:中文地址分词  决策树  地址要素  地址集
收稿时间:2017-03-28

Chinese Segmentation of City Address Set Based on the Statistical Decision Tree
YING Shen,LI Weiyang,HE Biao,WANG Wei,WAN Yuan.Chinese Segmentation of City Address Set Based on the Statistical Decision Tree[J].Geomatics and Information Science of Wuhan University,2019,44(2):302-309.
Authors:YING Shen  LI Weiyang  HE Biao  WANG Wei  WAN Yuan
Institution:1.School of Resource and Environmental Sciences, Wuhan University, Wuhan 430079, China2.College of Architecture and Urban Planning, Shenzhen University, Shenzhen 518000, China3.Shenzhen Research Center of Digital City Engineering, Shenzhen 518034, China4.Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Land and Resources, Shenzhen 518034, China5.Department of Urban and Environment, Hubei Normal University, Huangshi 435002, China
Abstract:Different from the conventional address word segmentation model, which relies on the city address dictionary or the rule set, this paper proposes a word segmentation method which does not depend on the address dictionary but based on massive address data mining. This method combines the statistic rules to calculate the distribution of the address elements in the address dataset, excavates the suffix points and the drop points of the address elements in the address data. The method constructs the statistical decision tree based on their relative position relations to extract the address elements, uses the investigation data of building address in Shenzhen to verify and to make a useful supplement to the current gazetteers.
Keywords:Chinese address segmentation  decision tree  address element  address set
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《武汉大学学报(信息科学版)》浏览原始摘要信息
点击此处可从《武汉大学学报(信息科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号