首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Apache Spark分布式计算框架可用于空间大数据的管理与计算,为实现云GIS提供基础平台。针对Apache Spark的数据组织与计算模型,结合Apache HBase分布式数据库,从分布式GIS内核的理念出发,设计并实现了分布式空间数据存储结构与对象接口,并基于某国产GIS平台软件内核进行了实现。针对点、线、面数据的存储与查询,与传统空间数据库系统PostGIS进行了一系列对比实验,验证了提出的分布式空间数据存储架构的可行性与高效性。  相似文献   

2.
随着对地立体观测体系的建立,遥感大数据不断累积。传统基于文件、景/幅式的影像组织方式,时空基准不够统一,集中式存储不利于大规模并行分析。对地观测大数据分析仍缺乏一套统一的数据模型与基础设施理论。近年来,数据立方体的研究为对地观测领域大数据分析基础设施提供了前景。基于统一的分析就绪型多维数据模型和集成对地观测数据分析功能,可构建一个基于数据立方的对地观测大数据分析基础设施。因此,本文提出了一个面向大规模分析的多源对地观测时空立方体,相较于现有的数据立方体方法,强调多源数据的统一组织、基于云计算的立方体处理模式以及基于人工智能优化的立方体计算。研究有助于构建时空大数据分析的新框架,同时建立与商业智能领域的数据立方体关联,为时空大数据建立统一的时空组织模型,支持大范围、长时序的快速大规模对地观测数据分析。本文在性能上与开源数据立方做了对比,结果证明提出的多源对地观测时空立方体在处理性能上具有明显优势。  相似文献   

3.
Geospatial Semantic Web promises better retrieval geospatial information for Digital Earth systems by explicitly representing the semantics of data through ontologies. It also promotes sharing and reuse of geospatial data by encoding it in Semantic Web languages, such as RDF, to form geospatial knowledge base. For many applications, rapid retrieval of spatial data from the knowledge base is critical. However, spatial data retrieval using the standard Semantic Web query language – Geo-SPARQL – can be very inefficient because the data in the knowledge base are no longer indexed to support efficient spatial queries. While recent research has been devoted to improving query performance on general knowledge base, it is still challenging to support efficient query of the spatial data with complex topological relationships. This research introduces a query strategy to improve the query performance of geospatial knowledge base by creating spatial indexing on-the-fly to prune the search space for spatial queries and by parallelizing the spatial join computations within the queries. We focus on improving the performance of Geo-SPARQL queries on knowledge bases encoded in RDF. Our initial experiments show that the proposed strategy can greatly reduce the runtime costs of Geo-SPARQL query through on-the-fly spatial indexing and parallel execution.  相似文献   

4.
随着位置信息在各行各业中的广泛应用,空间大数据得到迅猛发展.空间大数据除具有数据量大的特点之外,还具有复杂性,同时,越来越多的应用对数据的实时性也有较高的要求.传统的GIS软件在承载和处理空间数据时,也面临越来越多的挑战,如难以对复杂多样的空间数据进行一体化存储和管理;传统GIS软件架构和单机处理能力,无法对较大体量(10亿条记录或更大)的空间数据进行分析.本文从分布式存储技术、分布式空间处理计算技术和分布式计算协调技术三个方面阐述如何应对上述问题,并提出了将Spark分布式框架和Su-perMap iObject for Spark空间处理引擎相结合的分布式空间处理计算技术,以及数据库的一体化管理和监控技术,实现对多种数据库如PostgreSQL集群、MongoDB和Elasticsearch的统一管理和监控.  相似文献   

5.
Big geospatial data is an emerging sub‐area of geographic information science, big data, and cyberinfrastructure. Big geospatial data poses two unique challenges. First, raster and vector data structures and analyses have developed on largely separate paths for the last 20 years. This is creating an impediment to geospatial researchers seeking to utilize big data platforms that do not promote heterogeneous data types. Second, big spatial data repositories have yet to be integrated with big data computation platforms in ways that allow researchers to spatio‐temporally analyze big geospatial datasets. IPUMS‐Terra, a National Science Foundation cyberInfrastructure project, addresses these challenges by providing a unified framework of integrated geospatial services which access, analyze, and transform big heterogeneous spatio‐temporal data. As IPUMS‐Terra's data volume grows, we seek to integrate geospatial platforms that will scale geospatial analyses and address current bottlenecks within our system. However, our work shows that there are still unresolved challenges for big geospatial analysis. The most pertinent is that there is a lack of a unified framework for conducting scalable integrated vector and raster data analysis. We conducted a comparative analysis between PostgreSQL with PostGIS and SciDB and concluded that SciDB is the superior platform for scalable raster zonal analyses.  相似文献   

6.
Emerging computer architectures and systems that combine multi‐core CPUs and accelerator technologies, like many‐core Graphic Processing Units (GPUs) and Intel's Many Integrated Core (MIC) coprocessors, would provide substantial computing power for many time‐consuming spatial‐temporal computation and applications. Although a distributed computing environment is suitable for large‐scale geospatial computation, emerging advanced computing infrastructure remains unexplored in GIScience applications. This article introduces three categories of geospatial applications by effectively exploiting clusters of CPUs, GPUs and MICs for comparative analysis. Within these three benchmark tests, the GPU clusters exemplify advantages in the use case of embarrassingly parallelism. For spatial computation that has light communication between the computing nodes, GPU clusters present a similar performance to that of the MIC clusters when large data is applied. For applications that have intensive data communication between the computing nodes, MIC clusters could display better performance than GPU clusters. This conclusion will be beneficial to the future endeavors of the GIScience community to deploy the emerging heterogeneous computing infrastructure efficiently to achieve high or better performance spatial computation over big data.  相似文献   

7.
针对大型空间信息服务平台构建过程中的数据互联互通和信息集成问题,从集成框架体系结构、技术路线和关键技术3个方面进行研究,提出了分布式地理空间信息集成框架的5层体系结构模型和原子空间信息服务概念;通过研究分布式空间查询路由算法和虚拟四叉树模型等关键技术,探索实现分布式环境下的空间信息的互联互通。  相似文献   

8.
GML空间数据查询与索引机制研究   总被引:9,自引:0,他引:9  
由于传统GIS数据模型的差异,导致空间数据难以集成与共享。各GIS软件厂商及第三方软件厂商提出了利用空间数据转换的解决方案,但是它还是不能很好地解决空间数据集成与共享存在的问题。地理标记语言GML的出现,为GIS空间数据建模、集成与共享提供了统一的标准与框架。GML已经成为事实上的空间数据编码、传输、存储和发布的国际标准,大量GML格式的空间数据开始涌现。如何有效地存储管理GML空间数据,已经成为GIS研究的热点问题。本文结合XML数据库技术和传统的空间数据库技术,对GML空间数据的查询、索引进行了深入的研究。以XML标准查询语言XQuery为基础,提出了XQuery空间扩展的内容,开发了GML空间数据查询语言,实现了GML空间数据的本原查询;结合XML文档编码和传统的空间数据索引,提出了基于空间索引的GML一体化索引机制,并以R树索引为例,对一体化索引的查询处理性能进行了实验分析。实验结果表明,本文提出的基于空间索引的GML一体化索引机制是可行的、高效的。  相似文献   

9.
Input/output (I/O) of geospatial raster data often becomes the bottleneck of parallel geospatial processing due to the large data size and diverse formats of raster data. The open‐source Geospatial Data Abstraction Library (GDAL), which has been widely used to access diverse formats of geospatial raster data, has been applied recently to parallel geospatial raster processing. This article first explores the efficiency and feasibility of parallel raster I/O using GDAL under three common ways of domain decomposition: row‐wise, column‐wise, and block‐wise. Experimental results show that parallel raster I/O using GDAL under column‐wise or block‐wise domain decomposition is highly inefficient and cannot achieve correct output, although GDAL performs well under row‐wise domain decomposition. The reasons for this problem with GDAL are then analyzed and a two‐phase I/O strategy is proposed, designed to overcome this problem. A data redistribution module based on the proposed I/O strategy is implemented for GDAL using a message‐passing‐interface (MPI) programming model. Experimental results show that the data redistribution module is effective.  相似文献   

10.
Abstract

Geospatial simulation models can help us understand the dynamic aspects of Digital Earth. To implement high-performance simulation models for complex geospatial problems, grid computing and cloud computing are two promising computational frameworks. This research compares the benefits and drawbacks of both in Web-based frameworks by testing a parallel Geographic Information System (GIS) simulation model (Schelling's residential segregation model). The parallel GIS simulation model was tested on XSEDE (a representative grid computing platform) and Amazon EC2 (a representative cloud computing platform). The test results demonstrate that cloud computing platforms can provide almost the same parallel computing capability as high-end grid computing frameworks. However, cloud computing resources are more accessible to individual scientists, easier to request and set up, and have more scalable software architecture for on-demand and dedicated Web services. These advantages may attract more geospatial scientists to utilize cloud computing for the development of Digital Earth simulation models in the future.  相似文献   

11.
空间数据划分是空间大数据索引方法及其数据存储的重要组成部分。针对Hadoop云计算平台在空间数据划分及其存储方面的不足,提出了基于Hilbert空间填充曲线的海量空间矢量数据并行划分算法。在数据划分阶段,充分考虑空间数据相邻对象的空间位置关系、空间对象的自身大小以及相同编码块的空间对象个数等影响因素;通过“合并小编码块,分解大编码块”的划分原则,实现了云环境下海量空间矢量数据的并行划分算法。试验表明,该算法不仅能够提高海量空间矢量数据的索引效率,同时也能够很好地解决空间矢量数据在Hadoop分布式文件系统(Hadoop distributed file system,HDFS)上的数据倾斜问题。  相似文献   

12.
ABSTRACT

Light detection and ranging (LiDAR) data are essential for scientific discoveries such as Earth and ecological sciences, environmental applications, and responding to natural disasters. While collecting LiDAR data over large areas is quite possible the subsequent processing steps typically involve large computational demands. Efficiently storing, managing, and processing LiDAR data are the prerequisite steps for enabling these LiDAR-based applications. However, handling LiDAR data poses grand geoprocessing challenges due to data and computational intensity. To tackle such challenges, we developed a general-purpose scalable framework coupled with a sophisticated data decomposition and parallelization strategy to efficiently handle ‘big’ LiDAR data collections. The contributions of this research were (1) a tile-based spatial index to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system, (2) two spatial decomposition techniques to enable efficient parallelization of different types of LiDAR processing tasks, and (3) by coupling existing LiDAR processing tools with Hadoop, a variety of LiDAR data processing tasks can be conducted in parallel in a highly scalable distributed computing environment using an online geoprocessing application. A proof-of-concept prototype is presented here to demonstrate the feasibility, performance, and scalability of the proposed framework.  相似文献   

13.
吴政  武鹏达  李成名 《测绘学报》2019,48(11):1369-1379
时空索引是时空数据存储和管理的关键技术之一,基于空间填充曲线(space filling curve,SFC)的索引方法近年来受到了广泛关注。然而对于矢量数据,现有索引方法多侧重于空间索引的实现,难以同时顾及时间查询和空间查询的效率,且对于非点要素(线要素与面要素),确定最优的索引级别一直是难点所在。为此,本文面向对等网络环境,提出一种自适应层级的时空索引构建方法。首先提出了基于分区键和分区内排序键组合策略的时空信息联合编码,然后据此设计了点要素、非点要素的时空表达结构,最后设计了多层级树结构以构建时空索引MLS3(multi-level sphere 3),并基于地理实体时间粒度及空间密度等特征自适应确定其最优索引层级。利用轨迹(点要素)、公路(线要素)和建筑物(面要素)实际数据进行了试验。试验结果表明,相比GeoMesa提出的XZ3时空索引,本文索引方法可有效解决非点要素的时空表达及层级划分问题,在避免存储热点的同时实现更为高效的时空检索。  相似文献   

14.
针对地理空间数据交换和共享平台的地名数据日益增多及不同部门的数据差异所造成的地名检索效率低下问题,该文分析了平台中地名的表达特征,设计了面向地名信息的多级索引库组织方式,提出了地名特征词典的构建方法,设计并开发原型系统,实现了基于Lucene和地名特征词的检索框架。实验表明:多级索引通过基础索引、特征索引、分类索引三者联动的方式降低了地名检索的复杂度,具有较高的检索效率和准确度,应用于浙江省地理空间数据交换和共享平台取得了良好的效果。  相似文献   

15.
面向目标的栅格矢量一体化三维数据模型   总被引:1,自引:0,他引:1  
首先对现有的三维空间数据模型进行了讨论 ,分析了栅格、矢量和混合数据模型的特点 ,提出了一种面向目标的栅格矢量一体化数据模型。该模型将栅格数据以矢量方式进行组织 ,从而同时具有矢量和栅格数据模型的优点 ,也克服了目前普遍应用的混合模型所存在的缺点。还提出了一种三维空间的三级栅格划分和行次序编码方法。该方法存储空间小 ,便于快速索引和计算。最后 ,给出了具体的数据结构  相似文献   

16.
Nowadays, Spatial Data Infrastructures (SDIs) play an important role in government agencies, at different levels: global, national, and local. They aim to improve the management and sharing of geospatial data. Nonetheless, these SDIs have been developed as information islands, in which a user's query is compared to metadata described only in their own catalog services. The lack of interaction among SDIs limits the potential of these infrastructures in providing geospatial data to a larger audience. This article presents a distributed architecture, based on a federation of SDIs which interact among themselves, using query propagation. This propagation facilitates data discovery and sharing. We also describe a distributed query processing service used to enable the resource discovery in distributed infrastructures.  相似文献   

17.
随着GIS数据获取与处理技术的迅速发展,以土地利用为代表的矢量空间数据规模不断膨胀,大量生产应用对图层间矢量数据叠加赋值操作性能提出了更高要求.本文提出了基于Apache Spark技术的矢量数据叠加赋值方法,通过扩展Spark技术的弹性分布式数据集,使其提高对于GIS空间数据的表达能力,通过空间索引的构建使得叠加计算可以在Spark集群各节点上分布式高效运行.通过十万、百万、千万3种量级的数据进行实验,结果表明,相比传统算法,基于Spark技术的矢量数据叠加赋值方法有30%—90%的性能提升.  相似文献   

18.
一体化空间数据结构及其索引机制研究   总被引:17,自引:1,他引:16  
谈国新 《测绘学报》1998,27(4):293-299
本文提出了一种新的栅矢一体化空间数据结构,该结构采用三级划分策略及几何目标元子充填表达技术,使空间数据栅格化的同时,也能满足精度要求。同时引入弧段栅格比特阵和面要素自适应空间索引结构,有效地提高了空间检索效率。试验证明,上述理论及方法是可行的。  相似文献   

19.
大数据时代地理空间资源不断增多,但现有通用知识库较少考虑地理空间数据蕴含的语义知识,难以实现数据的快速检索.因此亟需引入本体技术,以蕴含的语义知识为基础,提高地理空间数据访问速度,精确获取用户所需信息.以本体为基础,提出了顾及地理空间数据语义知识的快速检索方法.首先,基于通名编码规则、地理空间数据和开源百度百科数据构建...  相似文献   

20.
An online spatial biodiversity model (SBM) for optimized and automated spatial modelling and analysis of geospatial data is proposed, which is based on web processing service (WPS) and web service orchestration (WSO) in parallel computing environment. The developed model integrates distributed geospatial data in geoscientific processing workflow to compute the algorithms of spatial landscape indices over the web using free and open source software. A case study for Uttarakhand state of India demonstrates the model outputs such as spatial biodiversity disturbance index (SBDI) and spatial biological richness index (SBRI). In order to optimize and automate, an interactive web interface is developed using participatory GIS approaches for implementing fuzzy AHP. In addition, sensitivity analysis and geosimulation experiments are also performed under distributed GIS environment. Results suggest that parallel algorithms in SBM execute faster than sequential algorithms and validation of SBRI with biological diversity shows significant correlation by indicating high R2 values.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号