首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
To assess how future progress in gravitational microlensing computation at high optical depth will rely on both hardware and software solutions, we compare a direct inverse ray-shooting code implemented on a graphics processing unit (GPU) with both a widely-used hierarchical tree code on a single-core CPU, and a recent implementation of a parallel tree code suitable for a CPU-based cluster supercomputer. We examine the accuracy of the tree codes through comparison with a direct code over a much wider range of parameter space than has been feasible before. We demonstrate that all three codes present comparable accuracy, and choice of approach depends on considerations relating to the scale and nature of the microlensing problem under investigation. On current hardware, there is little difference in the processing speed of the single-core CPU tree code and the GPU direct code, however the recent plateau in single-core CPU speeds means the existing tree code is no longer able to take advantage of Moore’s law-like increases in processing speed. Instead, we anticipate a rapid increase in GPU capabilities in the next few years, which is advantageous to the direct code. We suggest that progress in other areas of astrophysical computation may benefit from a transition to GPUs through the use of “brute force” algorithms, rather than attempting to port the current best solution directly to a GPU language – for certain classes of problems, the simple implementation on GPUs may already be no worse than an optimised single-core CPU version.  相似文献   

2.
We present a GPU accelerated CUDA-C implementation of the Barnes Hut (BH) tree code for calculating the gravitational potential on octree adaptive meshes. The tree code algorithm is implemented within the FLASH4 adaptive mesh refinement (AMR) code framework and therefore fully MPI parallel. We describe the algorithm and present test results that demonstrate its accuracy and performance in comparison to the algorithms available in the current FLASH4 version. We use a MacLaurin spheroid to test the accuracy of our new implementation and use spherical, collapsing cloud cores with effective AMR to carry out performance tests also in comparison with previous gravity solvers. Depending on the setup and the GPU/CPU ratio, we find a speedup for the gravity unit of at least a factor of 3 and up to 60 in comparison to the gravity solvers implemented in the FLASH4 code. We find an overall speedup factor for full simulations of at least factor 1.6 up to a factor of 10.  相似文献   

3.
A test particle code is employed to explore the dynamics of charged particles and perpendicular diffusion in turbulent magnetic field, where a three-dimensional (3D) isotropic turbulence model is used in this paper. The obtained perpendicular diffusion at different particle energies is compared with that of the nonlinear guiding center (NLGC) theory. It is found that the NLGC theory is consistent with test particle simulations when the particle energies are small. However, the difference between the NLGC theory and test particle simulations tends to increase when the particle energy is sufficiently large, and the threshold is related to the turbulence bend-over length. In the NLGC theory, the gyrocenter of a charged particle is assumed to follow the magnetic field line. Therefore, when the particle has sufficiently large energy, its gyroradius will be larger than the turbulence bend-over length. Then the particle can cross the magnetic field lines, and the difference between the test particle simulations and NLGC theory occurs.  相似文献   

4.
5.
We describe an implementation of compressible inviscid fluid solvers with block-structured adaptive mesh refinement on Graphics Processing Units using NVIDIA’s CUDA. We show that a class of high resolution shock capturing schemes can be mapped naturally on this architecture. Using the method of lines approach with the second order total variation diminishing Runge–Kutta time integration scheme, piecewise linear reconstruction, and a Harten–Lax–van Leer Riemann solver, we achieve an overall speedup of approximately 10 times faster execution on one graphics card as compared to a single core on the host computer. We attain this speedup in uniform grid runs as well as in problems with deep AMR hierarchies. Our framework can readily be applied to more general systems of conservation laws and extended to higher order shock capturing schemes. This is shown directly by an implementation of a magneto-hydrodynamic solver and comparing its performance to the pure hydrodynamic case. Finally, we also combined our CUDA parallel scheme with MPI to make the code run on GPU clusters. Close to ideal speedup is observed on up to four GPUs.  相似文献   

6.
Various radio observations have shown that the hot atmospheres of galaxy clusters are magnetized. However, our understanding of the origin of these magnetic fields, their implications on structure formation and their interplay with the dynamics of the cluster atmosphere, especially in the centres of galaxy clusters, is still very limited. In preparation for the upcoming new generation of radio telescopes (like Expanded Very Large Array, Low Wavelength Array, Low Frequency Array and Square Kilometer Array), a huge effort is being made to learn more about cosmological magnetic fields from the observational perspective. Here we present the implementation of magnetohydrodynamics (MHD) in the cosmological smoothed particle hydrodynamics (SPH) code gadget . We discuss the details of the implementation and various schemes to suppress numerical instabilities as well as regularization schemes, in the context of cosmological simulations. The performance of the SPH–MHD code is demonstrated in various one- and two-dimensional test problems, which we performed with a fully, three-dimensional set-up to test the code under realistic circumstances. Comparing solutions obtained using athena , we find excellent agreement with our SPH–MHD implementation. Finally, we apply our SPH–MHD implementation to galaxy cluster formation within a large, cosmological box. Performing a resolution study we demonstrate the robustness of the predicted shape of the magnetic field profiles in galaxy clusters, which is in good agreement with previous studies.  相似文献   

7.
8.
9.
We describe a new implementation of a parallel TreeSPH code with the aim of simulating galaxy formation and evolution. The code has been parallelized using shmem , a Cray proprietary library to handle communications between the 256 processors of the Silicon Graphics T3E massively parallel supercomputer hosted by the Cineca Super-computing Center (Bologna, Italy). 1
The code combines the smoothed particle hydrodynamics (SPH) method for solving hydrodynamical equations with the popular Barnes & Hut tree-code to perform gravity calculation with an N ×log  N scaling, and it is based on the scalar TreeSPH code developed by Carraro et al. Parallelization is achieved by distributing particles along processors according to a workload criterion.
Benchmarks, in terms of load balance and scalability, of the code are analysed and critically discussed against the adiabatic collapse of an isothermal gas sphere test using 2×104 particles on 8 processors. The code results balance at more than the 95 per cent level. Increasing the number of processors, the load balance slightly worsens. The deviation from perfect scalability for increasing number of processors is almost negligible up to 32 processors. Finally, we present a simulation of the formation of an X-ray galaxy cluster in a flat cold dark matter cosmology, using 2×105 particles and 32 processors, and compare our results with Evrard's P3M–SPH simulations.
Additionally we have incorporated radiative cooling, star formation, feedback from SNe of types II and Ia, stellar winds and UV flux from massive stars, and an algorithm to follow the chemical enrichment of the interstellar medium. Simulations with some of these ingredients are also presented.  相似文献   

10.
We explore numerically the transport of energetic particles in a turbulent magnetic field configuration. A test-particle code is employed to compute running diffusion coefficients as well as particle distribution functions in the different directions of space. Our numerical findings are compared with models commonly used in diffusion theory such as Gaussian distribution functions and solutions of the cosmic ray Fokker–Planck equation. Furthermore, we compare the running diffusion coefficients across the mean magnetic field with solutions obtained from the time-dependent version of the unified non-linear transport theory. In most cases we find that particle distribution functions are indeed of Gaussian form as long as a two-component turbulence model is employed. For turbulence setups with reduced dimensionality, however, the Gaussian distribution can no longer be obtained. It is also shown that the unified non-linear transport theory agrees with simulated perpendicular diffusion coefficients as long as the pure two-dimensional model is excluded.  相似文献   

11.
We review the implementation of individual particle time-stepping for N-body dynamics. We present a class of integrators derived from second order Hamiltonian splitting. In contrast to the usual implementation of individual time-stepping, these integrators are momentum conserving and show excellent energy conservation in conjunction with a symmetrized time step criterion. We use an explicit but approximate formula for the time symmetrization that is compatible with the use of individual time steps. No iterative scheme is necessary. We implement these ideas in the HUAYNO1 code and present tests of the integrators and show that the presented integration schemes shows good energy conservation, with little or no systematic drift, while conserving momentum and angular momentum to machine precision for long term integrations.  相似文献   

12.
We compare simulations of the Lyman α forest performed with two different hydrodynamical codes, gadget-2 and enzo . A comparison of the dark matter power spectrum for simulations run with identical initial conditions show differences of 1–3 per cent at the scales relevant for quantitative studies of the Lyman α forest. This allows a meaningful comparison of the effect of the different implementations of the hydrodynamic part of the two codes. Using the same cooling and heating algorithm in both codes, the differences in the temperature and the density probability distribution function are of the order of 10 per cent. The differences are comparable to the effects of box size and resolution on these statistics. When self-converged results for each code are taken into account, the differences in the flux power spectrum – the statistics most widely used for estimating the matter power spectrum and cosmological parameters from Lyman α forest data – are about 5 per cent. This is again comparable to the effects of box size and resolution. Numerical uncertainties due to a particular implementation of solving the hydrodynamic or gravitational equations appear therefore to contribute only moderately to the error budget in estimates of the flux power spectrum from numerical simulations. We further find that the differences in the flux power spectrum for enzo simulations run with and without adaptive mesh refinement are also of the order of 5 per cent or smaller. The latter require 10 times less CPU time making the CPU time requirement similar to that of a version of gadget-2 that is optimized for Lyman α forest simulations.  相似文献   

13.
We have developed a parallel Particle–Particle, Particle–Mesh (P3M) simulation code for the Cray T3E parallel supercomputer that is well suited to studying the time evolution of systems of particles interacting via gravity and gas forces in cosmological contexts. The parallel code is based upon the public-domain serial Adaptive P3M-SPH (http://coho.astro.uwo.ca/pub/hydra/hydra.html) code of Couchman et al. (1995)[ApJ, 452, 797]. The algorithm resolves gravitational forces into a long-range component computed by discretizing the mass distribution and solving Poisson's equation on a grid using an FFT convolution method, and a short-range component computed by direct force summation for sufficiently close particle pairs. The code consists primarily of a particle–particle computation parallelized by domain decomposition over blocks of neighbour-cells, a more regular mesh calculation distributed in planes along one dimension, and several transformations between the two distributions. The load balancing of the P3M code is static, since this greatly aids the ongoing implementation of parallel adaptive refinements of the particle and mesh systems. Great care was taken throughout to make optimal use of the available memory, so that a version of the current implementation has been used to simulate systems of up to 109 particles with a 10243 mesh for the long-range force computation. These are the largest Cosmological N-body simulations of which we are aware. We discuss these memory optimizations as well as those motivated by computational performance. Performance results are very encouraging, and, even without refinements, the code has been used effectively for simulations in which the particle distribution becomes highly clustered as well as for other non-uniform systems of astrophysical interest.  相似文献   

14.
The role of magnetohydrodynamic (MHD) turbulence in the cosmic ray acceleration process in a volume with a reconnecting magnetic field is studied by means of Monte Carlo simulations. We performed modelling of proton acceleration, with the three-dimensional analytic model of stationary reconnection of Craig et al. providing the unperturbed background conditions. Perturbations of particle trajectories resulting from a turbulent magnetic field component were simulated using small-amplitude pitch-angle momentum scattering, enabling modelling of both small- and large-amplitude turbulence in a wide wavevector range. Within the approach, no second-order Fermi acceleration process is allowed. Comparison of the acceleration process in models involving particle trajectory perturbations with the unperturbed model reveals that the turbulence can substantially increase the acceleration efficiency, enabling much higher final particle energies and flat particle spectra.  相似文献   

15.
Gravitational lensing calculation using a direct inverse ray-shooting approach is a computationally expensive way to determine magnification maps, caustic patterns, and light-curves (e.g. as a function of source profile and size). However, as an easily parallelisable calculation, gravitational ray-shooting can be accelerated using programmable graphics processing units (GPUs). We present our implementation of inverse ray-shooting for the NVIDIA G80 generation of graphics processors using the NVIDIA Compute Unified Device Architecture (CUDA) software development kit. We also extend our code to multiple GPU systems, including a 4-GPU NVIDIA S1070 Tesla unit. We achieve sustained processing performance of 182 Gflop/s on a single GPU, and 1.28 Tflop/s using the Tesla unit. We demonstrate that billion-lens microlensing simulations can be run on a single computer with a Tesla unit in timescales of order a day without the use of a hierarchical tree-code.  相似文献   

16.
We introduce a new code for cosmological simulations, PHo To Ns, which incorporates features for performing massive cosmological simulations on heterogeneous high performance computer(HPC) systems and threads oriented programming. PHo To Ns adopts a hybrid scheme to compute gravitational force, with the conventional Particle-Mesh(PM) algorithm to compute the long-range force,the Tree algorithm to compute the short range force and the direct summation Particle-Particle(PP) algorithm to compute gravity from very close particles. A self-similar space filling a Peano-Hilbert curve is used to decompose the computing domain. Threads programming is advantageously used to more flexibly manage the domain communication, PM calculation and synchronization, as well as Dual Tree Traversal on the CPU+MIC platform. PHo To Ns scales well and efficiency of the PP kernel achieves68.6% of peak performance on MIC and 74.4% on CPU platforms. We also test the accuracy of the code against the much used Gadget-2 in the community and found excellent agreement.  相似文献   

17.
We present a tree code for simulations of collisional systems dominated by a central mass. We describe the implementation of the code and the results of some test runs with which the performance of the code was tested. A comparison between the behaviour of the tree code and a direct hybrid integrator is also presented. The main result is that tree codes can be useful in numerical simulations of planetary accretion, especially during intermediate stages, where possible runaway accretion and dynamical friction lead to a population with a few large bodies in low-eccentricity and low-inclination orbits embedded in a large swarm of small planetesimals in rather excited orbits. Some strategies to improve the performance of the code are also discussed.  相似文献   

18.
When calculating the infrared spectral energy distributions (SEDs) of galaxies in radiation-transfer models, the calculation of dust grain temperatures is generally the most time-consuming part of the calculation. Because of its highly parallel nature, this calculation is perfectly suited for massively parallel general-purpose graphics-processing units (GPUs). This paper presents an implementation of the calculation of dust grain equilibrium temperatures on GPUs in the Monte-Carlo radiation transfer code sunrise, using the CUDA API. The GPU can perform this calculation 69 times faster than the eight CPU cores, showing great potential for accelerating calculations of galaxy SEDs.  相似文献   

19.
A theory is presented for the dynamics of dust particles in an incompressible turbulent fluid. Grain-gas coupling occurs through friction forces that are proportional to the mean grain velocity relative to the gas. This test particle theory is applied to the case of a Kolmogoroff spectrum in a protostellar cloud. The mean turbulence induced grain velocity and the mean turbulent relative velocity of two grains are calculated. Whereas the former should determine the dust scale height, grain-grain collisions are influenced by the latter. For a resonable strength of the turbulence, the mean induced relative velocity of two particles turns out to be at least as large as the corresponding terminal velocity difference during gravitational settling.Paper presented at the Conference on Protostars and Planets, held at the Planetary Science Institute, University of Arizona, Tucson, Arizona, between January 3 and 7, 1978.  相似文献   

20.
We describe a new implementation of a parallel Tree-SPH code with the aim of simulating galaxy formation and evolution. The code has been parallelized using SHMEM, a Cray proprietary library to handle communications between the 256 processors of the Silicon Graphics T3E massively parallel supercomputer hosted by the Cineca Super-computing Center (Bologna, Italy). The code combines the smoothed particle hydrodynamics (SPH) method to solve hydrodynamical equations with the popular Barnes and Hut (1986) tree-code to perform gravity calculation with a N × log N scaling, and it is based on the scalar Tree-SPH code developed by Carraro et al. (1998). Parallelization is achieved by distributing particles along processors according to a workload criterion. Benchmarks of the code, in terms of load balance and scalability, are analysed and critically discussed against the adiabatic collapse of an isothermal gas sphere test using 2 × 104 particles on eight processors. The code turns out to be balanced at more than 95% level. If the number of processors is increased, the load balance worsens slightly. The deviation from perfect scalability at increasing number of processors is negligible up to 64 processors. Additionally we have incorporated radiative cooling, star formation, feedback and an algorithm to follow the chemical enrichment of the interstellar medium. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号