首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Stacking集成学习的恒星/星系分类研究
引用本文:李超,张文辉,李然,王俊义,林基明.基于Stacking集成学习的恒星/星系分类研究[J].天文学报,2020,61(2):21-111.
作者姓名:李超  张文辉  李然  王俊义  林基明
作者单位:桂林电子科技大学信息与通信工程学院桂林541004;桂林电子科技大学认知无线电与信息处理教育部重点实验室桂林541004,桂林电子科技大学广西云计算与大数据协同创新中心桂林541004;桂林电子科技大学广西高校云计算与复杂系统重点实验室桂林541004,桂林电子科技大学信息与通信工程学院桂林541004;桂林电子科技大学广西无线宽带通信与信号处理重点实验室桂林541004,桂林电子科技大学信息与通信工程学院桂林541004;桂林电子科技大学认知无线电与信息处理教育部重点实验室桂林541004,桂林电子科技大学信息与通信工程学院桂林541004;桂林电子科技大学认知无线电与信息处理教育部重点实验室桂林541004;广西高校卫星导航与位置感知重点实验室桂林541004
基金项目:%\hangafter 1 %\hangindent 2.5em 国家自然科学基金项目(61966007)、认知无线电与信息处理教育部重点实验室开发基金项目(CRKL180201)、广西无线宽带通信与信号处理重点实验室主任基金项目(GXKL06180107)、广西云计算与大数据协同创新中心、广西高校云计算与复杂系统重点实验室项目(1716)资助
摘    要:机器学习在当今诸多领域已经取得了巨大的成功,但是机器学习的预测效果往往依赖于具体问题.集成学习通过综合多个基分类器来预测结果,因此,其适应各种场景的能力较强,分类准确率较高.基于斯隆数字巡天(Sloan Digital Sky Survey,SDSS)计划恒星/星系中最暗源星等集分类正确率低的问题,提出一种基于Stacking集成学习的恒星/星系分类算法.从SDSS-DR7(SDSS Data Release 7)中获取完整的测光数据集,并根据星等值划分为亮源星等集、暗源星等集和最暗源星等集.仅针对分类较为复杂且困难的最暗源星等集展开分类研究.首先,对最暗源星等集使用10折嵌套交叉验证,然后使用支持向量机(Support Vector Machine,SVM)、随机森林(Random Forest,RF)、XGBoost(eXtreme Gradient Boosting)等算法建立基分类器模型;使用梯度提升树(Gradient Boosting Decision Tree,GBDT)作为元分类器模型.最后,使用基于星系的分类正确率等指标,与功能树(Function Tree,FT)、SVM、RF、GBDT、XGBoost、堆叠降噪自编码(Stacked Denoising AutoEncoders,SDAE)、深度置信网络(Deep Belief Network,DBN)、深度感知决策树(Deep Perception Decision Tree,DPDT)等模型进行分类结果对比分析.实验结果表明,Stacking集成学习模型在最暗源星等集分类中要比FT算法的星系分类正确率提高了将近10%.同其他传统的机器学习算法、较强的提升算法、深度学习算法相比,Stacking集成学习模型也有较大的提升.

关 键 词:恒星:基本参数  星系:基本参数  技术:测光  方法:数据分析
收稿时间:2019/12/13 0:00:00

Research on Star/Galaxy Classification Based on Stacking Ensemble Learning
LI Chao,ZHANG Wen-hui,LI Ran,WANG Jun-yi and LIN Ji-ming.Research on Star/Galaxy Classification Based on Stacking Ensemble Learning[J].Acta Astronomica Sinica,2020,61(2):21-111.
Authors:LI Chao  ZHANG Wen-hui  LI Ran  WANG Jun-yi and LIN Ji-ming
Institution:College of Information and Communication Engineering, Guilin University of Electronic Technology, Guilin 541004;Key Laboratory of Cognitive Radio and Information Processing, Ministry of Education, Guilin University of Electronic Technology, Guilin 541004,Guangxi Cooperative Innovation Center of Cloud Computing and Big Data, Guilin University of Electronic Technology, Guilin 541004;Guangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems, Guilin University of Electronic Technology, Guilin 541004,College of Information and Communication Engineering, Guilin University of Electronic Technology, Guilin 541004;Guangxi Key Laboratory of Wireless Wideband Communication and Signal Processing, Guilin University of Electronic Technology, Guilin 541004,College of Information and Communication Engineering, Guilin University of Electronic Technology, Guilin 541004;Key Laboratory of Cognitive Radio and Information Processing, Ministry of Education, Guilin University of Electronic Technology, Guilin 541004 and College of Information and Communication Engineering, Guilin University of Electronic Technology, Guilin 541004;Key Laboratory of Cognitive Radio and Information Processing, Ministry of Education, Guilin University of Electronic Technology, Guilin 541004;Guangxi Colleges and Universities Key Laboratory of Satellite Navigation and Position Sensing, Guilin 541004
Abstract:Machine learning has achieved great success in many areas today, but the predictive effect of machine learning often depends on the specific problem. An ensemble learning predicts results by integrating multiple base classifiers. Therefore, its ability to adapt to various scenarios is strong, and the classification accuracy is high. In response to the low classification accuracy of darkest source magnitude sets in star/galaxy in the Sloan Digital Sky Survey (SDSS), a star/galaxy classification algorithm based on the Stacking ensemble learning is proposed in this paper. The complete photometric data set is obtained from SDSS-Data Release (DR) 7 and divided into bright source magnitude set, dark source magnitude set, and darkest source magnitude set according to the magnitude. Firstly, the ten-fold nested cross-validation method is used for the darkest source magnitude set, and then the Support Vector Machine (SVM), Random Forest (RF), eXtreme Gradient Boosting (XGBoost) algorithms are used to establish the base-classifier model; the Gradient Boosting Decision Tree (GBDT) is used as the meta-classifier model. Finally, based on galaxies'' classification accuracy and other indicators, the classification results are compared with the models of Function Tree (FT), SVM, RF, GBDT, Stacked Denoising Autoencoders (SDAE), Deep Belief Nets (DBN), and Deep Perception Decision Tree (DPDT) etc., and then analyzed. The experimental results show that, the Stacking ensemble learning model improves the classification accuracy of galaxies in the darkest source classification by nearly 10% compared to the function tree algorithm. Compared with other traditional machine learning algorithms, strong lifting algorithms and deep learning algorithms, the Stacking ensemble learning model also has different degrees of improvement.
Keywords:stars: fundamental parameters  galaxies: fundamental parameters  techniques: photometric  methods: data analysis
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《天文学报》浏览原始摘要信息
点击此处可从《天文学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号