软件工程

引用本文:

杨健兵.改进的k-means聚类算法在公交IC卡数据分析中的应用研究[J].软件工程,2019,22(5):32-34.【点击复制】

【打印本页】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】

←前一篇|后一篇→

过刊浏览

分享到：微信更多

改进的k-means聚类算法在公交IC卡数据分析中的应用研究

杨健兵

(南通科技职业学院，江苏南通 226007)

摘要: 针对传统k-means算法中初始聚类中心随机确定的问题，提出k-means改进算法。首先，定义变量权值，权值的大小等于样本密度乘以簇间距离除以簇内样本平均距离，通过最大权值来确定聚类中心，克服了随机确定聚类中心的不稳定性。然后在Hadoop平台上用Map-Reduce框架下实现算法的并行化。最后以南通公交IC刷卡记录为例，通过改进的k-means聚类算法进行IC卡刷卡记录的分析。实验表明，在Hadoop平台下改进k-means算法运行稳定、可靠，具有很好的聚类效果。

关键词: MapReduce；改进k-means算法；k-means；聚类

中图分类号: TP301 文献标识码: A

基金项目: 本文系南通市科技资助项目“BP神经网络技术在智能公交IC卡中的应用研究”(项目编号：MS12017026-4).

Study on the Application of Improved K-means Clustering Algorithm in the Data Analysis of Bus IC Cards

YANG Jianbing

( Nantong Science and Technology College, Nantong 226007, China)

Abstract: Aiming at the problem of random determination of initial clustering centers in traditional k-means algorithm,an improved k-means algorithm is proposed in this paper.First,the weight value of the variable is defined.The weight value is equal to the sample density multiplied by the distance between clusters and then divided by the average distance within the cluster.The clustering center is determined by the maximum weight,and the instability of the cluster center is determined randomly.Then the parallelization of the algorithm is implemented under the Map-Reduce framework on the Hadoop platform.Finally,taking the Nantong bus IC card record as an example,an improved k-means clustering algorithm is used to analyze the IC card record.Experiments show that the improved k-means algorithm is stable and reliable under the Hadoop platform,with a good clustering effect.

Keywords: MapReduce;improved k-means algorithm;k-means;clustering

用微信扫一扫