引用本文: | 黄启航,汝 欣,戴 宁,俞 博,陈 炜,徐郁山.基于聚类分析法的织造车间能耗数据清洗[J].软件工程,2024,27(7):22-27.【点击复制】 |
|
|
|
|
摘 要: 针对织造车间数据采集过程中存在的数据质量低、数据冗余高的问题,提出了一种基于聚类分析法的综合数据清洗方法。首先,对纺织企业车间能耗进行层级分析,针对异常数据提出了基于二分K-means算法的异常数据识别方法。其次,针对缺失数据,采用多样化数据插补办法,实现对不同特征数据的插补;针对数据冗余高的问题,引入可决系数对数据集进行去重,降低数据集冗余。最后,以某纺织企业车间运行数据为对象进行仿真实验,结果表明,经降重后,数据集的数据量降低了83%,数据集预测实验的平均绝对百分比误差波动范围小于2%,该方法在降低数据冗余的同时保证了预测的可靠性。 |
关键词: 数据清洗;聚类;异常检测;去重 |
中图分类号: TP111.8
文献标识码: A
|
基金项目: 浙江省科技计划项目(2022C01202) |
|
Cleaning of Energy Consumption Data in Weaving Workshop Based on Clustering Analysis Method |
HUANG Qihang1, RU Xin1, DAI Ning1, YU Bo1, CHEN Wei2, XU Yushan3
|
(1.School of Mechanical Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China; 2.Zhejiang Tianheng In f ormation Technology Co., Ltd., Shaoxing 312500, China; 3.Zhejiang Kangli Automatic Control Technology Co., Ltd., Shaoxing 312500, China)
2801554196@qq.com; zhitingna@126.com; 990713260@qq.com; angle_xb@163.com; 287270195@qq.com; 1193570378@qq.com
|
Abstract: In view of the problems of low data quality and high data redundancy in the data collection process of the weaving workshop, this paper proposes a comprehensive data cleaning method based on clustering analysis method. Firstly, hierarchical analysis is conducted on the energy consumption of textile enterprises, and a method for identifying abnormal data based on the binary K-means algorithm is proposed for abnormal data. Secondly, for missing data, diversified data interpolation methods are used to impute different feature data; for the problem of high data redundancy, the determination coefficient is introduced to deduplicate the dataset and reduce dataset redundancy. Finally, simulation experiments are conducted on the operating data of a textile enterprise workshop. The results show that after the reduction, the data volume of the dataset is reduced by 83% , and the average absolute percentage error range of the dataset prediction experiment is less than 2% . This method ensures the reliability of prediction while reducing data redundancy. |
Keywords: data cleaning; clustering; abnormal detection; deduplication |
|
|
|
|