首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于经济户口匹配的语料库建设
引用本文:韩帅,李云岭,郭丰堂.基于经济户口匹配的语料库建设[J].测绘与空间地理信息,2016(4):131-134.
作者姓名:韩帅  李云岭  郭丰堂
作者单位:山东科技大学测绘科学与工程学院,山东青岛,266590
摘    要:经济户口数据匹配是将经济户口中包含的企业名称和驻址等基本信息通过地名地址的规范化和一系列匹配算法,得到最佳匹配坐标并定位到电子地图上的过程,是我国推行全面数字化建设的重要举措。本文系统分析了经济户口数据的组织形式,深入研究了中文匹配的原理特点,并以潍坊市奎文区9 000多条经济户口数据作为训练集,设计构建了基于双字哈希和数组三层数据结构的经济户口语料库,总结制定了涵盖500种行业类型的三级编码规则,并根据中文词条的文字相似性和行业之间的相关关系,设置文字和类型相似度值,采用动态加权方法求得复合相似度指标,建立了基于经济户口语料库的相似度匹配方法。文章最后以潍坊市潍城区8 000多条经济户口数据作为测试集,对本文提出的匹配方法进行实验验证。试验结果表明,利用本文设计的经济户口语料库和相似度匹配方法能够高效地完成经济户口数据的唯一性匹配和相似度匹配,极大地提高了检索效率和匹配成功率,具有良好的实际可操作性。

关 键 词:语料库  经济户口  相似度匹配

Corpus Construction and Matching Based on the Economic Accounts
HAN Shuai;LI Yun-ling;GUO Feng-tang.Corpus Construction and Matching Based on the Economic Accounts[J].Geomatics & Spatial Information Technology,2016(4):131-134.
Authors:HAN Shuai;LI Yun-ling;GUO Feng-tang
Institution:HAN Shuai;LI Yun-ling;GUO Feng-tang;College of Geomatics,Shandong University of Science and Technology;
Abstract:The economic accounts data matching is a process to get the best matching coordinates and position them to the electronic map through address standardization and a series of matching algorithm , which is an important measure of comprehensive digital con-struction.This paper systematically analyzed the organizational form of economic accounts data , in-depth studied of the characteris-tics of Chinese matching principle , and made more than 9,000 pieces of economic accounts data in Kuiwen District of Weifang as a training set, which designed and constructed a double word hashes and three -layer arrays economic accounts corpus , summarized three types of encoding rules covering 500 kinds of industries ,and according to Chinese text entry similarity and relationship between the industry set the text and type similarity value , using dynamic weighting to calculate a composite similarity index which was estab-lished similarity matching economic based on economic accounts corpus .Finally, it uses more than 8000 economic accounts in Weicheng District of Weifang city as the test set , to test the correctness of the matching method proposed in this paper by experiments . The test results showed that the method of the economic census register and the similarity matching method can effectively complete the unique matching of the economic data and the similarity matching , which greatly improves the efficiency of the retrieval and the matc-hing success rate with good practical operability .
Keywords:corpus  economic accounts  similarity matching
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号