摘 要: 针对推荐系统中依赖用户对项目的评分信息带来的稀疏性问题,提出一种融合标签文本的k-means聚类和矩阵分解的推荐算法。该模型首先对项目信息构建项目特征画像,利用k-means聚类提取项目的潜在特征数量,然后利用隐语义模型LFM 进行矩阵分解,将用户-评分矩阵进行分解重构得到预测评级,并根据排序推荐。将算法在MovieLens数据集上进行实验,结果表明该推荐算法的均方根误差(RMSE)和绝对平均误差(MAE)表现较好,在ml-latest-small数据集中的准确率(precision)和召回率(recall)较次优算法分别提升了14.5%和20.7%。通过将k-means聚类应用到用户的潜在兴趣和项目的潜在特征提取中,提升了推荐算法的有效性。 |
关键词: 推荐算法;矩阵分解;k-means;LFM |
中图分类号: TP391
文献标识码: A
|
|
An Algorithm of Combining K-means Clustering and Matrix Decomposition of Label Text |
JU Xiaoyuan, WANG Mingyan
|
(School of Management, Shanghai University of Engineering Science, Shanghai 201620, China)
ju_xiaoyuan_123@163.com; wmy61610@126.com
|
Abstract: Aiming at the sparsity problem caused by the reliance on user rating information for projects in recommendation systems, this paper proposes a recommendation algorithm combining k-means clustering and matrix decomposition of label text. In this model, a project feature portrait is firstly constructed based on project information, and the number of potential features of the project are extracted by using k-means clustering. Then, Latent Factor Model (LFM) is used for matrix decomposition. The user-rating matrix is decomposed and reconstructed to obtain a predictive rating, and recommendations are made based on sorting. The algorithm has been tested on the MovieLens dataset, and the results shows that the proposed recommended algorithm performs well in root mean square error (RMSE) and absolute mean error (MAE). The accuracy and recall rates in the ml-latest-small dataset are improved by 14.5% and 20.7% , respectively. K-means clustering is applied to the extraction of users' potential interests and items' potential features, which proves the effectiveness of the recommendation algorithm. |
Keywords: recommendation algorithm; matrix decomposition; k-means; LFM |