首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Performance tuning of N-body codes on modern microprocessors: I. Direct integration with a hermite scheme on x86_64 architecture
Institution:1. Department of Astronomy, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan;2. National Astronomical Observatory of Japan, Mitaka, Tokyo 181-8588, Japan;3. Institute for Advanced Study, Princeton, NJ 08540, USA;1. Research Center for Simulation Science, Japan Advanced Institute of Science and Technology (JAIST), 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan;2. Institute for Solid State Physics, The University of Tokyo, Kashiwanoha 5-1-5, Kashiwa, Chiba 277-8581, Japan;1. School of Mathematics and Statistics, Gansu Key Laboratory of Applied Mathematics and Complex Systems, Lanzhou University, Lanzhou 730000, PR China;2. EPFL-SB-MATHICSE-MCSS, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland;1. Université Paul Sabatier & IUF, Institut de Mathématiques de Toulouse, 118 route de Narbonne, 31062 Toulouse Cedex 9, France;2. Institut Camille Jordan, Université Claude Bernard, Lyon 1 43, boulevard du 11 novembre 1918 69 622 Villeurbanne Cedex, France;1. Hawai‘i Institute of Geophysics and Planetology, University of Hawai‘i at Mānoa, Honolulu, HI 96822, USA;2. Dept. of Mechanical and Industrial Engineering, University of Massachusetts, Amherst, MA 01003, USA;3. Center for Computational Astrophysics, National Astronomical Observatory of Japan, 2-21-1 Osawa, Mitaka, Tokyo 181-8588, Japan;1. Department of Respiratory Medicine, Okinawa Chubu Hospital, Okinawa, Japan;2. Second Division, Department of Internal Medicine, Hamamatsu University School of Medicine, Hamamatsu, Japan;3. Department of Rheumatology, Teikyo University Chiba Medical Center, Chiba, Japan;4. Department of Radiology, National Defense Medical College, Saitama, Japan;5. Department of Pathology, National Hospital Organization Tokyo National Hospital, Tokyo, Japan;1. Department of Mathematical Science, University of Delaware, Newark, DE, 19716, United States;2. Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, 70409, United States
Abstract:The main performance bottleneck of gravitational N-body codes is the force calculation between two particles. We have succeeded in speeding up this pair-wise force calculation by factors between 2 and 10, depending on the code and the processor on which the code is run. These speed-ups were obtained by writing highly fine-tuned code for x86_64 microprocessors. Any existing N-body code, running on these chips, can easily incorporate our assembly code programs.In the current paper, we present an outline of our overall approach, which we illustrate with one specific example: the use of a Hermite scheme for a direct N2 type integration on a single 2.0 GHz Athlon 64 processor, for which we obtain an effective performance of 4.05 Gflops, for double-precision accuracy. In subsequent papers, we will discuss other variations, including the combinations of N log N codes, single-precision implementations, and performance on other microprocessors.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号