大规模数据集Spark并行优化谱聚类 Spark parallel optimization large-scale spectral clustering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

大规模数据集Spark并行优化谱聚类

引用本文：	吕洪林,尹青山.大规模数据集Spark并行优化谱聚类[J].测绘通报,2019,0(12):96-100.

作者姓名：	吕洪林尹青山

作者单位：	辽宁对外经贸学院,辽宁大连,116052;辽宁对外经贸学院,辽宁大连116052;吉林大学,吉林长春130000

基金项目：	辽宁对外经贸学院博士科研启动基金（2019XJLXBSJJ002）；辽宁省教育厅科学研究项目（ldxy2017008）

摘要：	针对已有大规模数据集并行谱聚类算法的计算耗时和资源占用巨大等问题，基于当前批处理和图计算兼顾的Spark并行技术，提出了大规模数据集谱聚类的并行优化改进算法，算法通过并行单向迭代避免了相似矩阵计算时的数据重复计算，通过并行位置变换、标量乘法替换及距离缩放优化算法的资源占用，通过近似特征向量替代进一步优化算法的计算量。试验结果验证了算法近特征向量的有效性及在大规模数据集下良好聚类性能和扩展性。
关键词：	大规模集谱聚类近似特征向量 Spark并行框架 K-means距离计算优化
收稿时间：	2019-06-24
修稿时间：	2019-10-30
Spark parallel optimization large-scale spectral clustering

Lü Honglin,YIN Qingshan.Spark parallel optimization large-scale spectral clustering[J].Bulletin of Surveying and Mapping,2019,0(12):96-100.

Authors:	Lü Honglin YIN Qingshan

Institution:	1. Liaoning University of International Business and Economics, Dalian 116052, China;2. College of Mining Engineering, Jilin University, Changchun 130000, China

Abstract:	To solve the problems of computational time-consuming and resource occupation, which is hard to be prevented in existing spectral clustering on large-scale datasets, based on the Spark technology, an improved parallel optimization algorithm for spectral clustering is proposed. In which, repetitive calculation of data in similar matrix calculations is avoided by parallel one-way iteration, the resource occupancy is optimized by the parallel position transformation, the scalar multiplication replacement and the distance scaling, and the calculation amount is further optimized by the use of the approximate eigenvectors. The experimental results verify the effectiveness of the approximate eigenvectors and the good clustering performance and scalability under large-scale data sets.

Keywords:	large-scale spectral clustering approximate eigenvector Spark parallel computing K-means distance calculation optimization
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《测绘通报》浏览原始摘要信息
	点击此处可从《测绘通报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏