首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Geocoding is an uncertain process that associates an address or a place name with geographic coordinates. Traditionally, geocoding is performed locally on a stand-alone computer with the geocoding tools usually bundled in GIS software packages. The use of such tools requires skillful operators who know about the issues of geocoding, that is, reference databases and complicated geocoding interpolation techniques. These days, with the advancement in the Internet and Web services technologies, online geocoding provides its functionality to the Internet users with ease; thus, they are often unaware of such issues. With an increasing number of online geocoding services, which differ in their reference databases, the geocoding algorithms, and the strategy for dealing with inputs and outputs, it is crucial for the service requestors to realize the quality of the geocoded results of each service before choosing one for their applications. This is primarily because any errors associated with the geocoded addresses will be propagated to subsequent decisions, activities, modeling, and analysis. This article examines the quality of five online geocoding services: Geocoder.us, Google, MapPoint, MapQuest, and Yahoo!. The quality of each geocoding service is evaluated with three metrics: match rate, positional accuracy, and similarity. A set of addresses from the US Environmental Protection Agency (EPA) database were used as a baseline. The results were statistically analyzed with respect to different location characteristics. The outcome of this study reveals the differences among the online geocoding services on the quality of their geocoding results and it can be used as a general guideline for selecting a suitable service that matches an application's needs.  相似文献   

2.
利用在线地理编码API解决海量中文地址快速编码问题,在此基础上,利用简单的规则对编码结果进行清洗、标记,最后通过基于系统聚类与随机森林的分类优化模型,将多平台编码结果分类处理、优化。利用广州市盗窃案件地址对模型进行训练与验证,结果表明:相比未处理的编码结果,经模型优化过的编码结果整体位置误差距离减小。高德的地理编码服务有着最好的编码质量,但训练样本的高德编码误差均值仍高达590.43 m,经模型优化后,样本的编码误差均值降至173.73 m,验证样本编码误差均值由554.88 m(高德)降至180.04 m,降低了67.49%,高德90.08%的异常编码结果被清洗优化。对于训练样本与验证样本,模型优化效果相似;对于地址类型不同的案件、位于市区与市郊的案件,模型优化效果相似,说明模型具有一定普适性。该模型能够方便快捷地将海量社会经济信息转化为空间数据,提高编码精度,为地理大数据的研究提供更好的数据支持。  相似文献   

3.
Abstract

This paper offers a teaching strategy for incorporating TIGER/Line files into introductory GIS courses where IDRISI and OSUMAP are the primary software packages. TIGER/Line files present a valuable database for teaching GIS. The TIGER data structure aids in teaching concepts related to topological data structures, geocoding and address matching, and the files themselves provide an excellent database for laboratory exercises that incorporate census information along with environmental and natural resource data. Lack of support by commonly-used educational software packages to import TIGER/Line files directly has been a serious impediment in an instructional context. This paper presents software developed to convert TIGER/Line files into simple polygon vector files acceptable by IDRISI and OSUMAP for three alternative census geographic units (tracts, block groups and blocks). The resulting vector files are plotted for visual examination and graphical output. The vector files generated can also be imported into other GIS or computer mapping software packages.  相似文献   

4.
ABSTRACT

The analysis of geographically referenced data, specifically point data, is predicated on the accurate geocoding of those data. Geocoding refers to the process in which geographically referenced data (addresses, for example) are placed on a map. This process may lead to issues with positional accuracy or the inability to geocode an address. In this paper, we conduct an international investigation into the impact of the (in)ability to geocode an address on the resulting spatial pattern. We use a variety of point data sets of crime events (varying numbers of events and types of crime), a variety of areal units of analysis (varying the number and size of areal units), from a variety of countries (varying underlying administrative systems), and a locally-based spatial point pattern test to find the levels of geocoding match rates to maintain the spatial patterns of the original data when addresses are missing at random. We find that the level of geocoding success depends on the number of points and the number of areal units under analysis, but generally show that the necessary levels of geocoding success are lower than found in previous research. This finding is consistent across different national contexts.  相似文献   

5.
A Monte Carlo approach is used to evaluate the uncertainty caused by incorporating Post Office Box (PO Box) addresses in point‐cluster detection for an environmental‐health study. Placing PO Box addresses at the centroids of postcode polygons in conventional geocoding can introduce significant error into a cluster analysis of the point data generated from them. In the restricted Monte Carlo method I presented in this paper, an address that cannot be matched to a precise location is assigned a random location within the smallest polygon believed to contain that address. These random locations are then combined with the locations of precisely matched addresses, and the resulting dataset is used for performing cluster analysis. After repeating this randomization‐and‐analysis process many times, one can use the variance in the calculated cluster evaluation statistics to estimate the uncertainty caused by the addresses that cannot be precisely matched. This method maximizes the use of the available spatial information, while also providing a quantitative estimate of the uncertainty in that utilization. The method is applied to lung‐cancer data from Grafton County, New Hampshire, USA, in which the PO Box addresses account for more than half of the address dataset. The results show that less than 50% of the detected cluster area can be considered to have high certainty.  相似文献   

6.
张艳林  李敏  刘宇文  李佳  侯钰婧 《地理科学》2022,42(6):993-1004
基于“学籍信息中的家庭地址承载了学生空间位置”这一假设,通过学籍信息收集了湖南省株洲县小学生的家庭地址,借助高德开放平台提供的地理编码和POI搜索服务,获得到了株洲县小学生的空间位置和分布,并基于最短路径分析和高斯型两步移动搜索法分析了株洲县小学教育资源的空间可达性及其特征,尝试为区域教育资源的空间均衡性分析与规划配置提供新的数据源和方法借鉴。结果表明:① 基于学籍地址和地理编码技术能够较准确地获取株洲县小学生的空间分布。② 株洲县小学生就近入学距离的最大值、平均值和中位数分别为11.83 km、2.10 km和1.81 km,就近入学距离小于2.0 km的学生仅占55.46%,为株洲县兼顾公平和效率的教育资源配置工作带来了挑战。③ 株洲县北部城镇地区因学校数量较多,平均就近入学距离较小,教育资源的空间可达性普遍较高,且空间差异小,均衡性好;而东南部的乡村地区,平均就近入学距离较大,教育资源的空间可达性普遍较低,且空间差异大。④ 基于情景分析,在不造成局地生源稳定性问题的前提下,新增3所学校后,东南部地区的平均就近入学距离和教育资源的空间可达性有很大的改善,龙潭镇和龙门镇的平均入学距离由3784 m和3520 m降低到3116 m和2636 m,教育资源的空间可达性分别由0.0492和0.0982提高到0.0762和0.1496。  相似文献   

7.
8.
ABSTRACT

Address matching is a crucial step in geocoding, which plays an important role in urban planning and management. To date, the unprecedented development of location-based services has generated a large amount of unstructured address data. Traditional address matching methods mainly focus on the literal similarity of address records and are therefore not applicable to the unstructured address data. In this study, we introduce an address matching method based on deep learning to identify the semantic similarity between address records. First, we train the word2vec model to transform the address records into their corresponding vector representations. Next, we apply the enhanced sequential inference model (ESIM), a deep text-matching model, to make local and global inferences to determine if two addresses match. To evaluate the accuracy of the proposed method, we fine-tune the model with real-world address data from the Shenzhen Address Database and compare the outputs with those of several popular address matching methods. The results indicate that the proposed method achieves a higher matching accuracy for unstructured address records, with its precision, recall, and F1 score (i.e., the harmonic mean of precision and recall) reaching 0.97 on the test set.  相似文献   

9.
Chinese address segmentation is a serious challenge in geographic information system geocoding. Most previous studies have relied on predefined gazetteers without considering the information contained by a raw address corpus. In this paper, a hybrid method employing both rule-based and statistical methods is proposed for Chinese address segmentation without a predefined gazetteer. This approach utilizes statistical methods to extract address information from a raw address corpus and a rule-based method to segment Chinese addresses. Two typical statistical methods and their combinations with rule-based methods are compared with the hybrid method in an experiment involving approximately 460,000 address items in Shenzhen City, China. The experimental results indicate that the proposed method achieves an F-score of over 0.8, which is better than those of existing methods, thus validating the proposed method.  相似文献   

10.
Differences in the reporting units of data from diverse sources and changes in units over time are common obstacles to analysis of areal data. We compare common approaches to this problem in the context of changes over time in the boundaries of U.S. census tracts. In every decennial census, many tracts are split, consolidated, or changed in other ways from the previous boundaries to reflect population growth or decline. We examine two interpolation methods to create a bridge between years, one that relies only on areal weighting and another that also introduces population weights. Results demonstrate that these approaches produce substantially different estimates for variables that involve population counts, but they have a high degree of convergence for variables defined as rates or averages. Finally, the article describes the Longitudinal Tract Database (LTDB), through which we are making available public-use tools to implement these methods to create estimates within 2010 tract boundaries for any tract-level data (from the census or other sources) that are available for prior years as early as 1970.  相似文献   

11.
The original purpose of addresses was to enable the correct and unambiguous delivery of postal mail. The advent of computers and more specifically geographic information systems (GIS) opened up a whole new range of possibilities for the use of addresses, such as routing and vehicle navigation, spatial demographic analysis, geo‐marketing, and service placement and delivery. Such functionality requires a database which can store and access spatial data effectively. In this paper we present address databases and justify the need for national address databases. We describe models used for national address databases, and present our evaluation framework for an address database at a national level within the context of a spatial data infrastructure (SDI). The models of data harvesting, federated databases and data grids are analyzed and evaluated according to our novel framework, and we show that the data grid model has some unique features that make it attractive for a national address database in an environment where centralized control and/or coordination is difficult or undesirable.  相似文献   

12.
《Urban geography》2013,34(7):724-738
Determining an accurate depiction of population distribution for urban areas in order to develop an improved "denominator" is important for the calculation of higher-precision rates in GIS analyses, particularly when exploring the spatial dynamics of disease. Rather than using data aggregated by arbitrary administrative boundaries such as census tracts, we developed the Cadastral-Based Expert Dasymetric System (CEDS), an interpolation method using ancillary information to delineate areas of homogeneous values. This method uses cadastral data, land-use filters, modeling by expert system routines, and validation against various census enumeration units and other data. The CEDS method is presented through a case study of asthma hospitalizations in the borough of the Bronx in New York City, in relation to proximity buffers constructed around major sources of air pollution. The analysis using CEDS shows that asthma hospitalization risk due to proximity to pollution sources is greater than previously calculated using traditional disaggregation methods.  相似文献   

13.
《Urban geography》2013,34(5):385-409
Quality of life cannot be fully described by the availability of regional amenities. Instead, it also depends on the quality of neighborhoods and housing. In connection with discussions on knowledge workers, not much systematic research has been done on the characteristics of neighborhoods characterized by high proportions of same-sex households. By analyzing census tracts within select counties across the United States using Poisson regressions, this article investigates what factors are related to the number of same-sex households.  相似文献   

14.
An Assessment and Explanation of Environmental Inequity in Baltimore   总被引:1,自引:0,他引:1  
《Urban geography》2013,34(6):581-595
In Baltimore, census tracts made up of White, working-class people are more likely to contain a Toxics Release Inventory (TRI) facility than primarily Black census tracts. Differences in race characteristics decrease with larger units of analysis and with the use of half-mile buffers around TRI sites. At the census-tract level, race is the most significant population characteristic, followed by income and education. A long history of residential and occupational segregation may explain the proximity of toxic-release sites to working-class White neighborhoods.  相似文献   

15.
Obesity is a serious public health problem in the United States. It is important to estimate obesity prevalence at the local level to target programmatic and policy interventions. It is challenging, however, to obtain local estimates of obesity prevalence because national health surveys such as the Centers for Disease Control and Prevention (CDC) Behavioral Risk Factor Surveillance System (BRFSS) are not designed to produce direct estimates at the local levels (e.g. census tracts) due to small population samples and the need to preserve individual confidentiality. In this study we address the problem of estimating local obesity prevalence rates by implementing a spatial microsimulation modeling technique to proportionally replicate the demographic characteristics of BRFSS respondents to census tract populations in metropolitan Detroit. Obesity prevalence rates are examined for high and low spatial clusters and studied in relation to the U.S. Department of Agriculture's (USDA) measures of low-income neighborhoods and local food deserts and CDC's measure of healthy and less healthy food environments currently used to target obesity reduction initiatives. This study found that obesity prevalence was largely clustered in the City of Detroit extending north into contiguous suburbs. The spatial patterns of highest obesity prevalence tracts were most similarly aligned with USDA-defined low-income tracts and CDC's less healthy food tracts. The locations of USDA's food desert tracts rarely overlapped with the highest obesity prevalence tracts. This study demonstrated a new methodology by which to assess local areas in need of future obesity interventions.  相似文献   

16.
Several spatial measures of community food access identifying so called “food deserts” have been developed based on geospatial information and commercially-available, secondary data listings of food retail outlets. It is not known how data inaccuracies influence the designation of Census tracts as areas of low access. This study replicated the U.S. Department of Agriculture Economic Research Service (USDA ERS) food desert measure and the Centers for Disease Control and Prevention (CDC) non-healthier food retail tract measure in two secondary data sources (InfoUSA and Dun & Bradstreet) and reference data from an eight-county field census covering 169 Census tracts in South Carolina. For the USDA ERS food deserts measure accuracy statistics for secondary data sources were 94% concordance, 50–65% sensitivity, and 60–64% positive predictive value (PPV). Based on the CDC non-healthier food retail tracts both secondary data demonstrated 88–91% concordance, 80–86% sensitivity and 78–82% PPV. While inaccuracies in secondary data sources used to identify low food access areas may be acceptable for large-scale surveillance, verification with field work is advisable for local community efforts aimed at identifying and improving food access.  相似文献   

17.
《Urban geography》2013,34(1):55-79
Relatively few factorial ecologies have explored either the consistency of the social dimensionality of urban areas in more than a few cities or the separation of city-specific from general effects. This study of almost 3,000 census tracts in all 24 Canadian metropolitan areas (CMAs) used 35 variables from 198 1 census data to solve these problems. It shows there is a persistent similarity in six of the seven to nine dimensions found in separate analyses of three city size categories: over 1 million; 0.5-1 million; 100-500 thousand people. From this basis a combined study of all the centers shows that 85% of the variability can be summarized by nine dimensions called Economic Status, Impoverishment, Ethnicity, Early and Late Family, Family/Age, Pre-Family, Non-Family, Housing, and Migrant Status. The evidence for several different family-related axes illustrates the increasing complexity of the social dimensionality of modern cities based on family differentiation. F-ratio values and Eta coefficients are used to show that all the first-order axes, except Migration and Ethnicity, have much greater variability within, rather than between the cities, demonstrating the general rather than the city-specific nature of these dimensions. An analysis of the highest scoring tracts on the axes demonstrates the way in which some CMAs have relatively high incidences of some of the characteristics, thereby identifying the particular characteristics of many centers.  相似文献   

18.
19.
Abstract

Remotely-sensed data constitute a major potential source of input to geographical information systems (GIS)However, these data often have a relatively poor classification accuracy compared with that of the cartographic data from maps with which they may be combined in the course of GIS analysis. The possibility exists of using data sets (in the form of digital maps) resident within a GIS in order to improve this accuracy, before the classified image is incorporated into the GIS. Results are discussed from a British Alvey Information Technology project to develop a system for the knowledge-based segmentation and classification of remotely-sensed terrain images, in which the knowledge contained in digital map  相似文献   

20.
This article addresses the issue of linking temporal and spatial information into a GIS database structure to investigate the land-use changes in a rural-urban region over a thirty-five-year period. More specifically, it describes the application of a programming package developed to build temporal topology in an historical land-use GIS database to efficiently perform spatiotemporal queries. The program was created within the MapInfo environment using MapBasic language. Different types of information, such as the rate of change, the relationship between the change of land use and zoning regulations, and land-use succession were extracted from the database. A user-friendly interface was also developed to easily address spatiotemporal queries to the database. This approach represents a flexible and performing tool for scientists and planners who need to efficiently capture essential spatiotemporal information required for geographical inquiry and decision-making.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号