首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
To assess how future progress in gravitational microlensing computation at high optical depth will rely on both hardware and software solutions, we compare a direct inverse ray-shooting code implemented on a graphics processing unit (GPU) with both a widely-used hierarchical tree code on a single-core CPU, and a recent implementation of a parallel tree code suitable for a CPU-based cluster supercomputer. We examine the accuracy of the tree codes through comparison with a direct code over a much wider range of parameter space than has been feasible before. We demonstrate that all three codes present comparable accuracy, and choice of approach depends on considerations relating to the scale and nature of the microlensing problem under investigation. On current hardware, there is little difference in the processing speed of the single-core CPU tree code and the GPU direct code, however the recent plateau in single-core CPU speeds means the existing tree code is no longer able to take advantage of Moore’s law-like increases in processing speed. Instead, we anticipate a rapid increase in GPU capabilities in the next few years, which is advantageous to the direct code. We suggest that progress in other areas of astrophysical computation may benefit from a transition to GPUs through the use of “brute force” algorithms, rather than attempting to port the current best solution directly to a GPU language – for certain classes of problems, the simple implementation on GPUs may already be no worse than an optimised single-core CPU version.  相似文献   

2.
We present the results of gravitational direct N-body simulations using the commercial graphics processing units (GPU) NVIDIA Quadro FX1400 and GeForce 8800GTX, and compare the results with GRAPE-6Af special purpose hardware. The force evaluation of the N-body problem was implemented in Cg using the GPU directly to speed-up the calculations. The integration of the equations of motions were, running on the host computer, implemented in C using the 4th order predictor–corrector Hermite integrator with block time steps. We find that for a large number of particles (N  104) modern graphics processing units offer an attractive low cost alternative to GRAPE special purpose hardware. A modern GPU continues to give a relatively flat scaling with the number of particles, comparable to that of the GRAPE. The GRAPE is designed to reach double precision, whereas the GPU is intrinsically single-precision. For relatively large time steps, the total energy of the N-body system was conserved better than to one in 106 on the GPU, which is impressive given the single-precision nature of the GPU. For the same time steps, the GRAPE gave somewhat more accurate results, by about an order of magnitude. However, smaller time steps allowed more energy accuracy on the grape, around 10−11, whereas for the GPU machine precision saturates around 10−6 For N  106 the GeForce 8800GTX was about 20 times faster than the host computer. Though still about a factor of a few slower than GRAPE, modern GPUs outperform GRAPE in their low cost, long mean time between failure and the much larger onboard memory; the GRAPE-6Af holds at most 256k particles whereas the GeForce 8800GTX can hold 9 million particles in memory.  相似文献   

3.
We describe an implementation of compressible inviscid fluid solvers with block-structured adaptive mesh refinement on Graphics Processing Units using NVIDIA’s CUDA. We show that a class of high resolution shock capturing schemes can be mapped naturally on this architecture. Using the method of lines approach with the second order total variation diminishing Runge–Kutta time integration scheme, piecewise linear reconstruction, and a Harten–Lax–van Leer Riemann solver, we achieve an overall speedup of approximately 10 times faster execution on one graphics card as compared to a single core on the host computer. We attain this speedup in uniform grid runs as well as in problems with deep AMR hierarchies. Our framework can readily be applied to more general systems of conservation laws and extended to higher order shock capturing schemes. This is shown directly by an implementation of a magneto-hydrodynamic solver and comparing its performance to the pure hydrodynamic case. Finally, we also combined our CUDA parallel scheme with MPI to make the code run on GPU clusters. Close to ideal speedup is observed on up to four GPUs.  相似文献   

4.
We describe a parallel hybrid symplectic integrator for planetary system integration that runs on a graphics processing unit (GPU). The integrator identifies close approaches between particles and switches from symplectic to Hermite algorithms for particles that require higher resolution integrations. The integrator is approximately as accurate as other hybrid symplectic integrators but is GPU accelerated.  相似文献   

5.
A fully real-time coherent dedispersion system has been developed for the pulsar back-end at the Giant Metrewave Radio Telescope (GMRT). The dedispersion pipeline uses the single phased array voltage beam produced by the existing GMRT software back-end (GSB) to produce coherently dedispersed intensity output in real time, for the currently operational bandwidths of 16 MHz and 32 MHz. Provision has also been made to coherently dedisperse voltage beam data from observations recorded on disk. We discuss the design and implementation of the real-time coherent dedispersion system, describing the steps carried out to optimise the performance of the pipeline. Presently functioning on an Intel Xeon X5550 CPU equipped with a NVIDIA Tesla C2075 GPU, the pipeline allows dispersion free, high time resolution data to be obtained in real-time. We illustrate the significant improvements over the existing incoherent dedispersion system at the GMRT, and present some preliminary results obtained from studies of pulsars using this system, demonstrating its potential as a useful tool for low frequency pulsar observations. We describe the salient features of our implementation, comparing it with other recently developed real-time coherent dedispersion systems. This implementation of a real-time coherent dedispersion pipeline for a large, low frequency array instrument like the GMRT, will enable long-term observing programs using coherent dedispersion to be carried out routinely at the observatory. We also outline the possible improvements for such a pipeline, including prospects for the upgraded GMRT which will have bandwidths about ten times larger than at present.  相似文献   

6.
The graphics processing unit (GPU) acceleration of the manifold correction algorithm based on the compute unified device architecture (CUDA) technology is designed to simulate the dynamic evolution of the Post-Newtonian (PN) Hamiltonian formulation of spinning compact binaries. The feasibility and the efficiency of parallel computation on GPU have been confirmed by various numerical experiments. The numerical comparisons show that the accuracy on GPU execution of manifold corrections method has a good agreement with the execution of codes on merely central processing unit (CPU-based) method. The acceleration ability when the codes are implemented on GPU can increase enormously through the use of shared memory and register optimization techniques without additional hardware costs, implying that the speedup is nearly 13 times as compared with the codes executed on CPU for phase space scan (including \(314 \times 314\) orbits). In addition, GPU-accelerated manifold correction method is used to numerically study how dynamics are affected by the spin-induced quadrupole–monopole interaction for black hole binary system.  相似文献   

7.
8.
The increasing array size of radio astronomy interferometers is causing the associated computation to scale quadratically with the number of array signals. Consequently, efficient usage of alternate processing architectures should be explored in order to meet this computational challenge. Affordable parallel processors have been made available to the general scientific community in the form of the commodity graphics card. This work investigates the use of the Graphics Processing Unit in the parallelisation of the combined conjugate multiply and accumulation stage of a correlator for a radio astronomy array. Using NVIDIA’s Compute Unified Device Architecture, our testing shows processing speeds from one to two orders of magnitude faster than a Central Processing Unit approach.  相似文献   

9.
在光滑物质分布模型下,临界曲线是强引力透镜系统中像平面上一条放大率为无穷的线,而考虑少量离散质量的微透镜效应后,源平面上的放大率分布会出现复杂的结构,为暗物质成分的探测提供了一种有效途径.模拟临界曲线附近微透镜效应存在临界曲线上放大率无穷大和计算量巨大的困难.要达到所需的模拟精度,直接使用传统的光线追踪算法需要巨大的计算资源.为此发展了一个能实现海量计算的Graphics Processing Unit (GPU)并行方法来模拟临界曲线附近的微引力透镜效应.在型号为NVIDIA Tesla V100S PCIe 32 GB的GPU上,对于需要处理13000多个微透镜天体、发射1013量级光线的模拟,耗时在7000 s左右.在GPU并行的基础上,与直接的光线追踪算法相比,插值近似的引入使计算速度提升约两个数量级.利用该方法生成80个放大率分布图,并从中抽取800条光变曲线,进行了微焦散线数密度和峰值放大率的统计.  相似文献   

10.
11.
云南天文台40m射电望远镜进行的脉冲星观测数据量巨大,必须实现数据的实时处理,否则将会产生海量的数据积压.为实现这一目标,采用图形处理器架构,对Mark5B数据进行解码、消色散、折叠等处理.实验结果表明,对以1s8MB的实时采样,可以在0.51s内处理完成,从而实现了实时处理的要求.首先介绍这一观测系统各部分的图形处理器实现,然后相对于传统中央处理器构架,对各部分的运算速度进行了详细的对比.针对时间开销最大的消色散部分,分析了单次傅里叶变换的数据量大小对执行效率的影响.从系统最终的输出轮廓和柱状图上可以看到实时处理的结果符合要求.最后对存在的问题和未来的工作进行了讨论.  相似文献   

12.
We present and discuss the characteristics and performance, both in term of computational speed and precision, of a numerical code which integrates the equation of motions of N ‘particles’ interacting via Newtonian gravitation and move in an external galactic smooth field. The force evaluation on every particle is done by mean of direct summation of the contribution of all the other system’s particles, avoiding truncation error. The time integration is done with second-order and sixth-order symplectic schemes. The code, NBSymple, has been parallelized twice, by mean of the Compute Unified Device Architecture (CUDA) to make the all-pair force evaluation as fast as possible on high-performance Graphic Processing Units NVIDIA TESLA C1060, while the O(N) computations are distributed on various CPUs by mean of OpenMP Application Program. The code works both in single-precision floating point arithmetics or in double precision. The use of single-precision allows the use of the GPU performance at best but, of course, limits the precision of simulation in some critical situations. We find a good compromise in using a software reconstruction of double-precision for those variables that are most critical for the overall precision of the code. The code is available on the web site astrowww.phys.uniroma1.it/dolcetta/nbsymple.html.  相似文献   

13.
We present a GPU accelerated CUDA-C implementation of the Barnes Hut (BH) tree code for calculating the gravitational potential on octree adaptive meshes. The tree code algorithm is implemented within the FLASH4 adaptive mesh refinement (AMR) code framework and therefore fully MPI parallel. We describe the algorithm and present test results that demonstrate its accuracy and performance in comparison to the algorithms available in the current FLASH4 version. We use a MacLaurin spheroid to test the accuracy of our new implementation and use spherical, collapsing cloud cores with effective AMR to carry out performance tests also in comparison with previous gravity solvers. Depending on the setup and the GPU/CPU ratio, we find a speedup for the gravity unit of at least a factor of 3 and up to 60 in comparison to the gravity solvers implemented in the FLASH4 code. We find an overall speedup factor for full simulations of at least factor 1.6 up to a factor of 10.  相似文献   

14.
When calculating the infrared spectral energy distributions (SEDs) of galaxies in radiation-transfer models, the calculation of dust grain temperatures is generally the most time-consuming part of the calculation. Because of its highly parallel nature, this calculation is perfectly suited for massively parallel general-purpose graphics-processing units (GPUs). This paper presents an implementation of the calculation of dust grain equilibrium temperatures on GPUs in the Monte-Carlo radiation transfer code sunrise, using the CUDA API. The GPU can perform this calculation 69 times faster than the eight CPU cores, showing great potential for accelerating calculations of galaxy SEDs.  相似文献   

15.
16.
随着天文大科学设备的投入使用,传统的开发模式面临程序重复开发,环境依赖冲突等问题。另外,集群是一个高度耦合的计算资源,严重的环境冲突可能导致整个集群不可用。为了解决这个问题,采用微服务的概念开发新的流水线框架,这种框架可以实现短期内开发和部署新的流水线。介绍了通过这种框架开发的ONSET数据流水线,为了实现准实时数据处理,采用MPI和GPU技术对核心程序做了优化,并对最后的性能做了评估。结果表明,这种开发模式可以在短期内搭建满足需求的流水线,这种开发模式对未来多波段多终端的天文数据处理有借鉴意义。  相似文献   

17.
Internal layers in ice masses can be detected with ice-penetrating radar. In a flowing ice mass, each horizon represents a past surface that has been subsequently buried by accumulation, and strained by ice flow. These layers retain information about relative spatial patterns of accumulation and ablation (mass balance). Internal layers are necessary to accurately infer mass-balance patterns because the ice-surface shape only weakly reflects spatial variations in mass balance. Additional rate-controlling information, such as the layer age, the ice temperature, or the ice-grain sizes and ice-crystal fabric, can be used to infer the absolute rate of mass balance. To infer mass balance from the shapes of internal layers, we solve an inverse problem. The solution to the inverse problem is the best set or sets of unknown boundary conditions or initial conditions that, when used in our calculation of ice-surface elevation and internal-layer shape, generate appropriate predictions of observations that are available. We also show that internal layers can be used to infer martian paleo-surface topography from a past era of ice flow, even though the topography may have been largely altered by subsequent erosion. We have successfully inferred accumulation rates and surface topography from internal layers in Antarctica. Using synthetic data, we demonstrate the ability of this method to solve the corresponding inverse problem to infer accumulation and ablation rates, as well as the surface topography, for martian ice. If past ice flow has affected the shapes of martian internal layers, this method is necessary to infer the spatial pattern and rate of mass balance.  相似文献   

18.
We present a hybrid combination of forward and inverse reconstruction methods using multiple observations of a coronal mass ejection (CME) to derive the three-dimensional (3D) “true” height?–?time plots for individual CME components. We apply this hybrid method to the components of the 31 December 2007 CME. This CME, observed clearly in both the STEREO A and STEREO B COR2 white-light coronagraphs, evolves asymmetrically across the 15-solar-radius field of view within a span of three hours. The method has two reconstruction steps. We fit a boundary envelope for the potential 3D CME shape using a flux-rope-type model oriented to best match the observations. Using this forward model as a constraining envelope, we then run an inverse reconstruction, solving for the simplest underlying 3D electron density distribution that can, when rendered, reproduce the observed coronagraph data frames. We produce plots for each segment to establish the 3D or “true” height?–?time plots for each center of mass as well as for the bulk CME motion, and we use these plots along with our derived density profiles to estimate the CME’s asymmetric expansion rate.  相似文献   

19.
A graphics card implementation of a test-particle simulation code is presented that is based on the CUDA extension of the C/C++ programming language. The original CPU version has been developed for the calculation of cosmic-ray diffusion coefficients in artificial Kolmogorov-type turbulence. In the new implementation, the magnetic turbulence generation, which is the most time-consuming part, is separated from the particle transport and is performed on a graphics card. In this article, the modification of the basic approach of integrating test particle trajectories to employ the SIMD (single instruction, multiple data) model is presented and verified. The efficiency of the new code is tested and several language-specific accelerating factors are discussed. For the example of isotropic magnetostatic turbulence, sample results are shown and a comparison to the results of the CPU implementation is performed.  相似文献   

20.
We present 594 radial velocity measurements for 71 white dwarfs obtained during our search for binary white dwarfs and not reported elsewhere. We identify three excellent candidate binaries, which require further observations to confirm our preliminary estimates for their orbital periods, and one other good candidate. We investigate whether our data support the existence of a population of single, low-mass (≲0.5 M) white dwarfs (LMWDs). These stars are difficult to explain using standard models of stellar evolution. We find that a model with a mixed single/binary population is at least ~20 times more likely to explain our data than a pure binary population. This result depends on assumed period distributions for binary LMWDs, assumed companion masses and several other factors. Therefore, the evidence in favour of the existence of a population of single LMWDs is not sufficient, in our opinion, to firmly establish the existence of such a population, but does suggest that extended observations of LMWDs to obtain a more convincing result would be worthwhile.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号