通过训练样本采样处理改善小宗作物遥感识别精度 Improvement in recognition accuracy of minority crops by resampling of imbalanced training datasets of remote sensing期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

通过训练样本采样处理改善小宗作物遥感识别精度

引用本文：	樊东东,李强子,王红岩,张源,杜鑫,沈宇.通过训练样本采样处理改善小宗作物遥感识别精度[J].遥感学报,2019,23(4):730-742.

作者姓名：	樊东东李强子王红岩张源杜鑫沈宇

作者单位：	中国科学院遥感与数字地球研究所, 北京 100101;中国科学院大学资源与环境学院, 北京 100049,中国科学院遥感与数字地球研究所, 北京 100101,中国科学院遥感与数字地球研究所, 北京 100101,中国科学院遥感与数字地球研究所, 北京 100101,中国科学院遥感与数字地球研究所, 北京 100101,中国科学院遥感与数字地球研究所, 北京 100101;中国科学院大学资源与环境学院, 北京 100049

基金项目：	国家自然科学基金（编号：41571422）

摘要：	训练样本质量是决定农作物遥感识别精度的关键因素,虽然高空间分辨率卫星的发展有效地解决了农作物遥感识别过程中的混合像元问题,但是当区域内不同作物种植面积差异较大时,训练集中不同类别样本数量往往相差较大,这样的不均衡数据集影响分类器的训练,导致少数类别的识别精度不理想。为研究作物遥感识别过程中的不均衡样本问题,本文基于GF-2号卫星数据,首先挖掘了地物的光谱信息、纹理信息,用特征递归消除RFE (Recursive Feature Elimination)方法进行特征优选,然后从数据处理的角度采用了5种采样算法对不均衡训练集进行处理,最后使用采样后的均衡数据集训练分类器,对比数据采样前后决策树与Adaboost(Adaptive Boosting)两种分类器的识别结果,发现:(1)经过采样处理后两种分类算法明显提升了小宗作物的分类精度;(2)经过ADASYS (Adaptive synthetic sampling)采样处理后,分类器性能提升最多,决策树的Kappa系数提高了14.32%,Adaboost的Kappa系数提高了10.23%,达到最高值0.9336;(3)过采样的处理效果优于欠采样,过采样对分类器的性能提升更多。综上所述,选择合适的采样方法和分类方法是提高不均衡数据集遥感分类精度的有效途径。
关键词：	作物识别不均衡数据集采样遥感小宗作物 (GF-2)高分二号
收稿时间：	2017/11/11 0:00:00
Improvement in recognition accuracy of minority crops by resampling of imbalanced training datasets of remote sensing

FAN Dongdong,LI Qiangzi,WANG Hongyan,ZHANG Yuan,DU Xin and SHEN Yu.Improvement in recognition accuracy of minority crops by resampling of imbalanced training datasets of remote sensing[J].Journal of Remote Sensing,2019,23(4):730-742.

Authors:	FAN Dongdong LI Qiangzi WANG Hongyan ZHANG Yuan DU Xin and SHEN Yu

Institution:	Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, China;College of Resources and Environmental Sciences of University of Chinese Academy of Sciences, Beijing 100049, China,Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, China,Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, China,Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, China,Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, China and Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, China;College of Resources and Environmental Sciences of University of Chinese Academy of Sciences, Beijing 100049, China

Abstract:	The rapid development of high-spatial-resolution satellites has effectively alleviated the problem of mixed pixels in satellite images, thereby enabling extraction of the meticulous distribution of crops from them. The classification of remote sensing images is a quick way to obtain accurate agricultural information. However, the accuracy of supervised classification using remote sensing images is affected by several factors, such as classifier algorithm and input datasets. The imbalanced training samples, which indicates the number of training samples of some categories is considerably smaller or larger than the others, often results in poor classification accuracy for the minority classes. To improve this situation and generalization performance of classifier, this research focused on proper utilization of resampling techniques and classification methodologies for achieving perfect performance of remote sensing image classification. We investigated the aforementioned images by data mining approaches including spectrum and texture features and selection of optimized features based on recursive feature elimination. Then, five resample methods, namely, three over-resampling methods and two under-sampling methods, were separately used to balance the initial training datasets. Finally, we tested the resampled datasets by utilizing two classifiers (decision tree and AdaBoost) and evaluated the performance of each one in terms of kappa coefficient, overall accuracy, producer''s accuracy, and user''s accuracy. The overall classification accuracy and kappa coefficient improved considerably on decision tree (14.32%) and AdaBoost classifier (10.23%) after resampling. The AdaBoost obtained the highest value of kappa coefficient (0.9336) by using the training dataset resampled with ADASYN. The accuracy of classification on minority crops was also increased by resampling training datasets. Meanwhile, feature selection results showed that vegetation and texture indexes were more efficient than features of original reflection ratio to classification. Over-resampling methods had advantages in relieving the influence of imbalanced training samples to classifiers. Resampling process to training datasets has remarkable advantage in improving the classifier performance if the training datasets are critically imbalanced. The detailed accuracy assessment shows that over-resampling method is more excellent than under-resampling. The reason is that some significant samples are lost during under-resampling, but helpful and useful information is added after over-resampling. AdaBoost classifier performs better than decision tree in terms of solving imbalanced training datasets. Combination of proper resampling approaches and compatible classifier can significantly improve the accuracy of minority classes in the situation of imbalanced dataset classification.

Keywords:	crops recognition imbalanced datasets resampling remote sensing minority crops GF-2
本文献已被 CNKI 等数据库收录！
	点击此处可从《遥感学报》浏览原始摘要信息
	点击此处可从《遥感学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏