摘 要: 针对基于Iapriori算法的多维关联规则数据挖掘存在I/O负载过大,候选项集指数倍增加,优化算法随机性强,容易陷入局部最优解等问题。本文提出一种基于上三角矩阵和多叉树结合(UTMTU)的多维关联规则挖掘算法,算法对原数据编码筛选后映射为上三角矩阵,再映射为频繁项集树,实现整个过程只扫描一次数据库而不产生候选项
集,将时间和空间成本尽量降到最低,并利用有效属性层次数提高内存和I/O的利用率。通过UTMTU与Iapriori对比分
析表明,其算法的效率和精度得到显著地提高,有效改善原始算法的两个瓶颈问题。 |
关键词: 多维关联规则;上三角矩阵;频繁项集树;有效属性层次数 |
中图分类号: TP391
文献标识码: A
|
基金项目: 天津大学-青海民族大学自主创新基金;国家自然科学基金项目(61572351);天津市自然科学基金(15JCQNJC00200). |
|
A Multi-Dimensional Association Rules Mining Algorithm Based on Upper Triangular Matrix-Tree Union |
YE Tao,YU Lixia,ZHANG Yaping1,2
|
1.( 1.College of Computer, Qinghai Nationalities University, Xining 810007, China;2. 2.School of Computer Science and Technology, Tianjin University, Tianjin 300072, China )
|
Abstract: Many problems exist in the multi-dimensional association rules mining algorithm based on IApriori.The I/O load is too heavy,the size of the candidate set is increased exponentially,and the optimization algorithm is random and easy to fall into the local optimal solution.Accordingly,the paper proposes a multi-dimensional association rules mining algorithm based on Upper Triangular Matrix-Tree Union (UTMTU).The algorithm filters and maps the original coding data to the upper triangular matrix,and then maps it into the frequent item sets.In the whole process,UTMTU only scans the database once and does not generate the candidate item sets,which reduces the time and space cost to the minimum.The utilization rate of memory and I/O is improved by using the number of effective attribute layers.Compared with the algorithm based on IApriori,the efficiency and accuracy of the algorithm based on UTMTU has been effectively improved.Consequently,the UTMTU-based algorithm is more suitable for multi-layer and multi-attribute MARP. |
Keywords: multi-dimensional association rules;upper triangular matrix;frequent item sets tree;the effective layers of attributes |