摘 要: 现有的基于内容相似性的推荐算法在处理文本内容时,往往忽略了词序和上下文信息的重要性,并且计算复杂度较高。因此,文章提出了一种基于过滤冗余信息相似性的启发式方法,并成功地将该方法应用于电影推荐领域,实现了更精准的推荐效果。与其他算法对比,该算法在预测1部电影时的准确率提升了0.07百分点~0.24百分点,在预测3部电影时的准确率提升了0.05百分点~0.30百分点。以该算法的召回率作为基准(设为100%),在预测1部电影时,其他算法的召回率仅为该算法的2.38%~70.24%;在预测3部电影时,其他算法的召回率仅为该算法的3.78%~84.87%。以上结果证明了该算法的有效性和可行性。 |
关键词: 推荐系统;内容相似性;过滤冗余信息;LZ77算法;哈夫曼编码 |
中图分类号: TP391.4
文献标识码: A
|
基金项目: 国家自然科学基金项目(61802349) |
|
Movie Recommendation Algorithm Based on Filtering Redundant Information Similarity |
AI Jun, SUN Yang, SU Zhan, FANG Yuanjiang, XIE Zhengbin
|
(School of Opto-Electronic In f ormation and Computer Engineering, University of Shanghai f or Science and Technology, Shanghai 200093, China)
aijun@usst.edu.cn; sy_usst_net@163.com; suzhan@foxmail.com; 2569042572@qq.com; 1134619040@qq.com
|
Abstract: Existing content similarity-based recommendation algorithms often overlook the importance of word order and contextual information when processing textual content, resulting in high computational complexity. Therefore, this paper proposes a heuristic method based on filtering redundant information similarity and successfully applies this method to the movie recommendation field, achieving more accurate recommendations. Compared to other algorithms, this method improves accuracy by 0.07 percentage points to 0.24 percentage points when predicting a single movie, and by 0.05 percentage points to 0.30 percentage points when predicting three movies. Using the recall rate of this algorithm as a benchmark (set at 100% ), the recall rates of other algorithms for predicting a single movie range from 2.38% to 70.24% of this algorithm, while for predicting three movies, the recall rates range from 3.78% to 84.87% . These results demonstrate the effectiveness and feasibility of the proposed algorithm. |
Keywords: recommendation system; content similarity; filtering redundant information; LZ77 algorithm; Huffman coding |