• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:黄启航,汝 欣,戴 宁,俞 博,陈 炜,徐郁山.基于聚类分析法的织造车间能耗数据清洗[J].软件工程,2024,27(7):22-27.【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
基于聚类分析法的织造车间能耗数据清洗
黄启航1, 汝 欣1, 戴 宁1, 俞 博1, 陈 炜2, 徐郁山3
(1.浙江理工大学机械工程学院, 浙江 杭州 310018;
2.浙江天衡信息技术有限公司, 浙江 绍兴 312500;
3.浙江康立自控科技有限公司, 浙江 绍兴 312500)
2801554196@qq.com; zhitingna@126.com; 990713260@qq.com; angle_xb@163.com; 287270195@qq.com; 1193570378@qq.com
摘 要: 针对织造车间数据采集过程中存在的数据质量低、数据冗余高的问题,提出了一种基于聚类分析法的综合数据清洗方法。首先,对纺织企业车间能耗进行层级分析,针对异常数据提出了基于二分K-means算法的异常数据识别方法。其次,针对缺失数据,采用多样化数据插补办法,实现对不同特征数据的插补;针对数据冗余高的问题,引入可决系数对数据集进行去重,降低数据集冗余。最后,以某纺织企业车间运行数据为对象进行仿真实验,结果表明,经降重后,数据集的数据量降低了83%,数据集预测实验的平均绝对百分比误差波动范围小于2%,该方法在降低数据冗余的同时保证了预测的可靠性。
关键词: 数据清洗;聚类;异常检测;去重
中图分类号: TP111.8    文献标识码: A
基金项目: 浙江省科技计划项目(2022C01202)
Cleaning of Energy Consumption Data in Weaving Workshop Based on Clustering Analysis Method
HUANG Qihang1, RU Xin1, DAI Ning1, YU Bo1, CHEN Wei2, XU Yushan3
(1.School of Mechanical Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China;
2.Zhejiang Tianheng In f ormation Technology Co., Ltd., Shaoxing 312500, China;
3.Zhejiang Kangli Automatic Control Technology Co., Ltd., Shaoxing 312500, China)
2801554196@qq.com; zhitingna@126.com; 990713260@qq.com; angle_xb@163.com; 287270195@qq.com; 1193570378@qq.com
Abstract: In view of the problems of low data quality and high data redundancy in the data collection process of the weaving workshop, this paper proposes a comprehensive data cleaning method based on clustering analysis method. Firstly, hierarchical analysis is conducted on the energy consumption of textile enterprises, and a method for identifying abnormal data based on the binary K-means algorithm is proposed for abnormal data. Secondly, for missing data, diversified data interpolation methods are used to impute different feature data; for the problem of high data redundancy, the determination coefficient is introduced to deduplicate the dataset and reduce dataset redundancy. Finally, simulation experiments are conducted on the operating data of a textile enterprise workshop. The results show that after the reduction, the data volume of the dataset is reduced by 83% , and the average absolute percentage error range of the dataset prediction experiment is less than 2% . This method ensures the reliability of prediction while reducing data redundancy.
Keywords: data cleaning; clustering; abnormal detection; deduplication


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫