期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An efficient data organization and scheduling strategy for accelerating large vector data rendering

Mingqiang Guo Ying Huang Qingfeng Guan Zhong Xie Liang Wu 《Transactions in GIS》2017,21(6):1217-1236

Rendering large volumes of vector data is computationally intensive and therefore time consuming, leading to lower efficiency and poorer interactive experience. Graphics processing units (GPUs) are powerful tools in data parallel processing but lie idle most of the time. In this study, we propose an approach to improve the performance of vector data rendering by using the parallel computing capability of many‐core GPUs. Vertex transformation, largely a mathematical calculation that does not require communication with the host storage device, is a time‐consuming procedure because all coordinates of each vector feature need to be transformed to screen vertices. Use of a GPU enables optimization of a general‐purpose mathematical calculation, enabling the procedure to be executed in parallel on a many‐core GPU and optimized effectively. This study mainly focuses on: (1) an organization and storage strategy for vector data based on equal pitch alignment, which can adapt to the GPU's calculating characteristics; (2) a paging‐coalescing transfer and memory access strategy for vector data between the CPU and the GPU; and (3) a balancing allocation strategy to take full advantage of all processing cores of the GPU. Experimental results demonstrate that the approach proposed can significantly improve the efficiency of vector data rendering. 相似文献

2.

基于PMVS算法的大规模数据细粒度并行优化方法

下载免费PDF全文

刘金硕李扬眉江庄毅邓娟眭海刚 Pan Jeff 《武汉大学学报(信息科学版)》2019,44(4):608-616

三维多视角立体视觉算法（patch-based multi-view stereo,PMVS）以其良好的三维重建效果广泛应用于数字城市等领域,但用于大规模计算时算法的执行效率低下。针对此,提出了一种细粒度并行优化方法,从任务划分和负载均衡、主系统存储和GPU存储、通信开销等3方面加以优化;同时,设计了基于面片的PMVS算法特征提取的GPU和多线程并行改造方法,实现了CPUs_GPUs多粒度协同并行。实验结果表明,基于CPU多线程策略能实现4倍加速比,基于统一计算设备架构（compute unified device architecture,CUDA）并行策略能实现最高34倍加速比,而提出的策略在CUDA并行策略的基础上实现了30%的性能提升,可以用于其他领域大数据处理中快速调度计算资源。相似文献

3.

基于多GPU的Harris角点检测并行算法

肖汉周清雷张祖勋《武汉大学学报(信息科学版)》2012,37(7):876-881

提出了一种基于多图形处理器(graphic processing unit,GPU)设计思想的Harris角点检测并行算法,使用众多线程将计算中耗时的影像高斯卷积平滑滤波部分改造成单指令多线程(single instruction multi-ple thread,SIMT)模式,并采用GPU中共享存储器、常数存储器和锁页内存机制在统一计算设备架构(com-pute unified device archetecture,CUDA)上完成影像角点检测的全过程。实验结果表明,基于多GPU的Har-ris角点检测并行算法比CPU上的串行算法可获得最高达60倍的加速比,其执行效率明显提高,对于大规模数据处理呈现出良好的实时处理能力。相似文献

4.

负载分配的CPU/GPU高分辨率卫星影像调制传递补偿方法

方留杨王密李德仁潘俊《测绘学报》2014,43(6):598-606

本文系统地探讨了使用CPU/GPU协同处理理论对高分辨率卫星影像进行MTF补偿的方法。首先在GPU上对方法进行了基本实现,并通过三种性能优化策略（执行配置优化、存储访问优化和指令优化）进一步提高了方法的执行效率。在Intel Xeon E5650 CPU和NVIDIA Tesla C2050 GPU组成的CPU/GPU系统中对高分一号卫星全色影像进行MTF补偿,加速比达到42.80倍。在此基础上,为充分利用CPU的计算性能,使用CPU/GPU负载分配策略将部分负载分配给CPU进行处理,使用该策略后,方法加速比达到47.82倍,相应的处理时间压缩至1.62s,可满足对高分辨率卫星影像进行近实时MTF补偿的需求。相似文献

5.

基于GPU的任意多边形相交面积计算方法

高艺罗健欣裘杭萍唐斌吴波《测绘工程》2017,26(12)

一直以来,任意多边形相交面积的高效计算都是地理信息系统中空间分析算法研究的重点。文中提出了一种基于GPU的栅格化多边形相交面积算法GPURAS,在此基础上,分别采用蒙特卡罗方法和遮挡查询技术进一步提出GPURASMC算法和GPURASQ算法,并证明了上述算法的正确性。实验对简单多边形、任意复杂多边形及大数据量多边形进行了测试对比,结果表明:GPURAS算法精度高,通用性较好但效率受CPU与GPU通信延迟的影响;GPURASMC算法效率较高但牺牲了部分精度;GPURASQ算法精度高、效率高但局限于特定运行环境。与基于CPU的传统算法相比,文中所提3种算法效率更高,在处理包含大量顶点的多边形时,效率提升尤为明显。相似文献

6.

一种面向CPU/GPU异构环境的协同并行空间插值算法

下载免费PDF全文

王鸿琰关雪峰吴华意《武汉大学学报(信息科学版)》2017,42(12):1688-1695

CPU/GPU异构混合系统是一种新型高性能计算平台,但现有并行空间插值算法仅依赖CPU或GPU进行加速,迫切需要研究协同并行空间插值算法以充分利用异构计算资源,进一步提升插值效率。以薄板样条函数插值为例,提出一种CPU/GPU协同并行插值算法以加速海量激光雷达（light detector & ranger,LiDAR）点云生成数字高程模型（DEM）。通过插值任务的分解与抽象封装以屏蔽底层硬件执行模式的差异性,同时在多级协同并行框架基础上设计了Greedy-SET动态调度策略,策略顾及底层硬件能力的差异性,以实现异构并行资源的充分利用和良好负载均衡。实验表明,协同并行插值算法在高性能工作站上取得19.6倍的加速比,相比单一CPU或GPU并行算法,其效率提升分别达到54%和44%,实现了高效的协同并行处理。相似文献

7.

基于GPGPU的全波形并行分解算法

王宗跃马洪超明洋《遥感学报》2014,18(6):1217-1222

针对EM(Expectation Maximization)波形分解算法具有多次迭代和大量乘、除、累加等高密集运算的特点,提出一套将EM算法在通用计算图形处理器GPGPU上并行化的方案。针对通用并行计算架构CUDA的存储层次特点,设计总体的并行方案,充分挖掘共享存储器、纹理存储器的高速访存的潜能;根据波形采样值采用字节存储的特征,利用波形采样值的直方图求取中位数,从而降低求噪音阈值的计算复杂度;最后,采用求和规约的并行策略提高EM算法迭代过程中大量累加的计算效率。实验结果表明,当设置合理的并行参数、EM迭代次数大于16次、数据量大于64 M时,与单核CPU处理相比,GPU的加速比达到了8,能够显著地提高全波形分解的效率。相似文献

8.

Out‐of‐Core GPU‐based Change Detection in Massive 3D Point Clouds

Rico Richter Jan Eric Kyprianidis Jürgen Döllner 《Transactions in GIS》2013,17(5):724-741

If sites, cities, and landscapes are captured at different points in time using technology such as LiDAR, large collections of 3D point clouds result. Their efficient storage, processing, analysis, and presentation constitute a challenging task because of limited computation, memory, and time resources. In this work, we present an approach to detect changes in massive 3D point clouds based on an out‐of‐core spatial data structure that is designed to store data acquired at different points in time and to efficiently attribute 3D points with distance information. Based on this data structure, we present and evaluate different processing schemes optimized for performing the calculation on the CPU and GPU. In addition, we present a point‐based rendering technique adapted for attributed 3D point clouds, to enable effective out‐of‐core real‐time visualization of the computation results. Our approach enables conclusions to be drawn about temporal changes in large highly accurate 3D geodata sets of a captured area at reasonable preprocessing and rendering times. We evaluate our approach with two data sets from different points in time for the urban area of a city, describe its characteristics, and report on applications. 相似文献

9.

自适应分块加权Wallis并行匀色

李烁王慧王利勇于翔舟杨乐《遥感学报》2019,23(4):706-716

针对区域范围内多幅待镶嵌影像之间的色彩差异问题,提出一种基于GPU的分块加权Wallis并行匀色算法。首先,根据变异系数对影像自适应分块并利用双线性插值确定每一个像素的变换参数,利用加权Wallis变换消除影像间的色彩差异。然后,为了控制区域整体的匀色质量,利用Voronoi图和Dijkstra算法确定影像间的处理顺序。最后,利用GPU技术进行并行任务设计并从配置划分、存储器访问和指令吞吐量等方面进行优化,提高算法运算效率。实验结果表明,本文方法既能有效地消除影像间色彩差异,又能消除影像间的对比度差异。与CPU串行算法相比,GPU并行算法显著减少了计算时间,加速比最高达到60倍以上。相似文献

10.

遥感影像正射纠正的GPU-CPU协同处理研究 总被引：1，自引：0，他引：1

杨靖宇张永生李正国龚辉《武汉大学学报(信息科学版)》2011,36(9)

提出了一种基于CUDA的遥感影像正射纠正GPU-CPU协同处理方法,以实现重采样操作的GPU细粒度并行化。根据GPU的并行结构和硬件特点,采用执行配置优化技术提高warp占有率,利用共享存储器优化减少对效率低下的全局存储器中坐标变换系数的重复访问,通过纹理存储器代替全局存储器优化对原始影像数据的访问。实验结果表明,并行算法能够充分发挥GPU的并行处理能力,利用GeForce 9500 GT显卡,对大小为6 000像素×6 000像素的全色影像进行多项式纠正对比实验,最邻近灰度内插重采样和双线性灰度内插重采样的最终加速比分别能够达到8倍和10倍以上。相似文献

11.

CPU+GPU异构环境下数据密集型矢量多边形地理大数据并行框架

徐云耘周琛李满春《测绘通报》2022,(5):110-119

本文提出了面向CPU+GPU异构环境的数据密集型矢量多边形地理大数据并行计算框架(PFGAP)。PFGAP将数据密集型矢量多边形地理大数据的并行计算分解为算子、数据、粒度、并行环境及任务调度5个模块,分别设计相应的负载均衡并行计算策略;通过封装并行计算实现细节及数据密集型多边形算子的快速并行化。试验采用多边形三角剖分、栅格化及投影变换作为测试算例,采用土地利用数据作为测试数据,在不同类型的并行环境中计算并行效率。结果表明,PFGAP能很好地适用于不同类型的数据集、算子及并行计算环境。利用PFGAP实现的并行算法显著地降低了串行执行时间,取得了40.03的最优并行加速比。试验还分别测试了各个模块涉及的并行策略,结果表明取得的并行效率优于现有并行策略。相似文献

12.

CPU和GPU协同的多光谱影像快速波段配准方法

下载免费PDF全文

方留杨王密潘俊《武汉大学学报(信息科学版)》2018,43(7):1000-1007

随着遥感影像数据量的飞速增长,传统的串行波段配准方法已无法满足大数据多光谱影像的实时配准需求。针对该问题,提出了一种CPU和GPU协同的多光谱影像快速波段配准方法。首先进行计算量和并行度分析,将同名点匹配和微分纠正映射至GPU执行,仿射变换系数拟合仍驻留在CPU执行。其次通过核函数任务映射和基本设置,使算法步骤在GPU上可执行,并设计了3种性能优化方法(访存优化、指令优化、传输计算堆叠),进一步提高了波段配准的执行效率。在NVIDIA Tesla M2050 GPU和Intel Xeon E5650 CPU组成的实验平台上,对遥感26号卫星多光谱影像的实验表明,使用该方法加速后的波段配准执行时间仅为3.25 s,与传统串行方法相比,加速比达到了32.32倍,可以满足大数据多光谱影像的近实时配准需求。相似文献

13.

An efficient geosciences workflow on multi-core processors and GPUs: a case study for aerosol optical depth retrieval from MODIS satellite data

Jia Liu Dustin Feld Jochen Garcke Thomas Soddemann Peiyuan Pan 《International Journal of Digital Earth》2016,9(8):748-765

Quantitative remote sensing retrieval algorithms help understanding the dynamic aspects of Digital Earth. However, the Big Data and complex models in Digital Earth pose grand challenges for computation infrastructures. In this article, taking the aerosol optical depth (AOD) retrieval as a study case, we exploit parallel computing methods for high efficient geophysical parameter retrieval. We present an efficient geocomputation workflow for the AOD calculation from the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite data. According to their individual potential for parallelization, several procedures were adapted and implemented for a successful parallel execution on multi-core processors and Graphics Processing Units (GPUs). The benchmarks in this paper validate the high parallel performance of the retrieval workflow with speedups of up to 5.x on a multi-core processor with 8 threads and 43.x on a GPU. To specifically address the time-consuming model retrieval part, hybrid parallel patterns which combine the multi-core processor’s and the GPU’s compute power were implemented with static and dynamic workload distributions and evaluated on two systems with different CPU–GPU configurations. It is shown that only the dynamic hybrid implementation leads to a greatly enhanced overall exploitation of the heterogeneous hardware environment in varying circumstances. 相似文献

14.

基于 GPU 的 GNSS 信号跟踪设计与实现 总被引：1，自引：0，他引：1

张尧唐小妹陈华明孙广富《全球定位系统》2014,(5):59-63

软件接收机在数据后处理、算法设计与分析等方面发挥着重要的作用。由于传统的软件接收机均是由CPU 处理器实现,处理效率低下。图像处理单元是高度并行化的处理器,将导航信号处理中并行程度高且对时间要求最为严格的跟踪环节与GPU 的并行处理结构有机结合,能大大提升程序的效率。本文解决了采用GPU实现信号跟踪的关键技术,给出了相关的设计方案,并实现。试验结果表明：采用GPU 实现信号的跟踪,其效率提升了112.5倍。相似文献

15.

CPU/GPU near real-time preprocessing for ZY-3 satellite images: Relative radiometric correction,MTF compensation,and geocorrection

《ISPRS Journal of Photogrammetry and Remote Sensing》2014

ZY-3 is the first high-accuracy civil stereo-mapping optical satellite of China. It greatly improves China’s optical satellite image resolution with a boom in data volume, calling for new challenges in processing real-time applications. On the other hand, using central processing unit (CPU)/graphic processing unit (GPU) to resolve data-intensive remote sensing problems becomes a hot issue. In this paper, we present an approach for CPU/GPU near real-time preprocessing of ZY-3 satellite images, focusing on three key processors: relative radiometric correction (RRC), modulation transfer function compensation (MTFC), and geocorrection (GC). First, basic GPU implementation issues are addressed to make the processors capable of processing with GPU. Second, three effective GPU specific optimizations are applied for further improvement of the GPU performance. Furthermore, to fully exploit the CPU’s computing horsepower within the system, a CPU/GPU workload distribution scheme is proposed, in which CPU undertakes partial computation to share the workloads of GPU. The experimental result shows that our approach achieved an overall 48.84-fold speedup ratio in ZY-3 nadir image preprocessing (the corresponding run time is 11.60 s for one image), which is capable of meeting the requirement of near real-time response to the applications that follow. In addition, with the supportability of IEEE 754–2008 floating-point standard in the Fermi type GPU, preprocessing ZY-3 images with our CPU/GPU processors could maintain the quality of image preprocess as done traditionally with CPU processors. 相似文献

16.

异构环境下的快速质量引导相位解缠算法

钟何平张森田振唐劲松《武汉大学学报(信息科学版)》2015,40(6):756-760

提出了一种异构环境下的快速质量引导相位解缠算法。将干涉相位图进行分块,载入图形处理器(GPU)中共享存储器,实现干涉相位质量图的高密度并行计算。然后将质量图下载到主机内存,通过CPU进行量化质量引导,求解最终的解缠相位。该算法充分利用了GPU和CPU的计算特点实现快速质量引导相位求解。最后通过对InSAR和InSAS干涉相位图的解缠试验验证了所提算法的高效性。相似文献

17.

Concurrent Drainage Network Rendering for Automated Pen‐and‐ink Style Landscape Illustration

下载免费PDF全文

James E. Mower 《Transactions in GIS》2016,20(1):54-78

Pen‐and‐ink style geomorphological illustrations render landscape elements critical to the understanding of surface processes within a viewshed and, at their highest levels of execution, represent works of art, being both practical and beautiful. The execution of a pen‐and‐ink composition, however, requires inordinate amounts of time and skill. This article will introduce an algorithm for rendering creases – linework representing visually significant morphological features – at animation speeds, made possible with recent advances in graphics processing unit (GPU) architectures and rendering APIs. Beginning with a preprocessed high‐resolution drainage network model, creases are rendered from selected stream segments if their weighted criteria (slope, flow accumulation, and surface illumination), attenuated by perspective distance from the viewpoint, exceed a threshold. The algorithm thus provides a methodology for crease representation at continuous levels of detail down to the highest resolution of the preprocessed drainage model over a range of surface orientation and illumination conditions. The article also presents an implementation of the crease algorithm with frame rates exceeding those necessary to support animation, supporting the proposition that parallel processing techniques exposed through modern GPU programming environments provide cartographers with a new and inexpensive toolkit for constructing alternative and attractive real‐time animated landscape visualizations for spatial analysis. 相似文献

18.

天宫一号高光谱数据辐射校正的CUDA并行优化

赵海娜吴远峰张兵《遥感学报》2014,18(Z1):49-55

高光谱图像经过辐射校正后,消除了探测元的响应差异,能更好地满足专题信息提取的数据要求.利用探测元的列均值、列标准差等统计信息对天宫一号高光谱短波红外数据进行辐射校正检验,并基于GPU CUDA计算模型对均值归一化、矩匹配、相邻列均衡等3种相对辐射校正算法进行了并行计算优化.通过辐射校正计算流程拆分,CPU控制流程逻辑,GPU执行数据级并行计算,并建立CUDA的计算单元与数据单元的映射关系,获得5—7倍的计算加速比,这些辐射校正算法依据图像自身统计信息,且易于进行并行计算优化,满足实时校正的处理时效要求,为未来高光谱数据在轨实时辐射校正提供了新思路. 相似文献

19.

遥感影像CVA变化检测的CUDA并行算法设计 总被引：1，自引：1，他引：0

常方正赵银娣刘善磊《遥感学报》2016,20(1):114-128

随着遥感影像数据量以及复杂程度的日益增加,遥感图像的快速处理成为实际应用过程中亟需解决的问题。为了实现遥感影像的实时变化检测,针对基于变化矢量分析CVA的变化检测算法,设计了一种基于统一计算设备构架CUDA的并行处理模型。首先利用地理空间数据提取库GDAL实现大数据量遥感影像的分块读取、操作和保存;其次将基于变化矢量分析的变化检测过程分为变化强度检测、映射表构建和变化方向检测,并借助CUDA C将变化矢量分析算法的3个步骤嵌入到CPU和GPU组成的异构平台上进行实验;最后利用该模型对不同数据量的遥感影像进行CVA变化检测并作对比分析。实验结果表明:与CPU串行相比,基于GPU/CUDA的遥感影像CVA的变化检测速度提高了10倍左右;在一定程度上,达到了实时变化检测的效果。相似文献

20.

Parallel computing solutions for Markov chain spatial sequential simulation of categorical fields

Weixing Zhang Weidong Li Tian Zhao 《International Journal of Digital Earth》2019,12(5):566-582

The Markov chain random field (MCRF) model is a spatial statistical approach for modeling categorical spatial variables in multiple dimensions. However, this approach tends to be computationally costly when dealing with large data sets because of its sequential simulation processes. Therefore, improving its computational efficiency is necessary in order to run this model on larger sizes of spatial data. In this study, we suggested four parallel computing solutions by using both central processing unit (CPU) and graphics processing unit (GPU) for executing the sequential simulation algorithm of the MCRF model, and compared them with the nonparallel computing solution on computation time spent for a land cover post-classification. The four parallel computing solutions are: (1) multicore processor parallel computing (MP), (2) parallel computing by GPU-accelerated nearest neighbor searching (GNNS), (3) MP with GPU-accelerated nearest neighbor searching (MP-GNNS), and (4) parallel computing by GPU-accelerated approximation and GPU-accelerated nearest neighbor searching (GA-GNNS). Experimental results indicated that all of the four parallel computing solutions are at least 1.8× faster than the nonparallel solution. Particularly, the GA-GNNS solution with 512 threads per block is around 83× faster than the nonparallel solution when conducting a land cover post-classification with a remotely sensed image of 1000?×?1000 pixels. 相似文献