首页 | 本学科首页   官方微博 | 高级检索  
     检索      

用MapReduce框架构建虚拟天文台数据节点
引用本文:宋烜,周薇,韩冀中,崔辰州.用MapReduce框架构建虚拟天文台数据节点[J].云南天文台台刊,2012(2):150-156.
作者姓名:宋烜  周薇  韩冀中  崔辰州
作者单位:[1]北京天文馆,北京103044 [2]中国科学院计算技术研究所,北京100190 [3]中国科学院国家天文台,北京100012
基金项目:国家自然科学基金(10820002;60920010;90912005)资助.
摘    要:MapReduce是一种大规模分布式并行处理框架,最初被用于互联网服务中的海量数据处理,并逐渐扩展到各个行业领域。目前,虚拟天文台面临着越来越多的地面及空间望远镜观测到的海量天文数据。为了提高中国虚拟天文台数据节点处理海量天文数据的能力,首次提出基于MapReduce框架构建中国虚拟天文台数据节点的方法,并以批量星表交叉认证为例描述了具体实现过程,性能评估结果证明基于MapReduce框架构建虚拟天文台数据节点,可以在性能、扩展性与成本等多方面获得收益。

关 键 词:映射化简  中国虚拟天文台  交叉认证

Constructing Data Nodes of the China-VO with the MapReduce
Song Xuan,Zhou Wei,Han Jizhong,Cui Chenzhou.Constructing Data Nodes of the China-VO with the MapReduce[J].Publications of the Yunnan Observatoty,2012(2):150-156.
Authors:Song Xuan  Zhou Wei  Han Jizhong  Cui Chenzhou
Institution:1. Beijing Planetarium, Beijing 100044, China, 2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; 3. National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100012, China)
Abstract:The MapReduce is a distributed parallel processing model and execution environment for processing large data sets. It was initially applied to handle massive data in web service, but its applications have been extended to a variety of areas. A current project of Virtual Observatory may face an increasingly massive amount of astronomical data from ground-based and space telescopes. In order to improve the processing capacity of the astronomical data center in the China Virtual Observatory, this paper proposes a new approach to construct data nodes using the MapReduce. It translates an astronomical query to a standard SQL query, and then turns the query into a MapReduce job. It finally outputs the results in the standard formats of astronomical data. The MapReduce is integrated into the China Virtual Observatory by using the above three steps. Because cross-identifying between object catalogs takes place only once, the main consumed time in the MapReduce is in indexing and calculating data. We implement object cross-identification based on the MapReduce framework and our performance evaluation shows that the MapReduce-based cross-identification outperforms the traditional approach based on DBMS. Our results also show that the MapReduee-based framework achieves not only good performance but also scalability and low cost.
Keywords:MapReduce  China-VO  Cross-identification
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号