摘 要: 针对大数据时代电影资源快速增长而出现的信息过载问题,采用Spark分布式计算平台,结合MongoDB(分布式文档存储数据库)业务数据库、Redis(远程字典服务)缓存数据库,以及Flume(日志收集系统)、Kafka(开源流处理平台)和Spring开源框架等工具设计并实现了一个电影推荐系统。系统主要包括数据采集模块、离线推荐模块、实时推荐模块和综合业务模块,其中离线推荐模块采用交替最小二乘(ALS)算法训练隐语义模型求得预测评分,实时推荐模块通过Spark Streaming流实时处理用户在可视化界面对电影的评分操作。经功能测试及兼容性测试表明,系统可依据用户行为偏好,快速准确地完成电影推荐,可用性较高。 |
关键词: 离线推荐;电影推荐;Spark |
中图分类号: TP391
文献标识码: A
|
基金项目: 黄河交通学院校级一流课程建设项目“Linux原理及应用”(HHJTXY-2021ylkc13);黄河交通学院校级教学资源库建设项目“NoSQL数据库原理”(HHJTXY-2022kczyk103). |
|
Design and Implementation of Spark-based Movie Recommendation System |
LIU Nian, CAI Chunhua
|
(Huanghe Jiaotong University, Jiaozuo 454000, China)
n135691@foxmail.com; 786803383@qq.com
|
Abstract: To address the problem of information overload caused by the rapid growth of movie resources in the era of big data, this paper proposes to design and implement a movie recommendation system by using Spark distributed computing platform, combined with MongoDB (distributed document storage database) business database, Redis (remote dictionary service) cache database, Flume (log collection system), Kafka (open source stream processing platform) and Spring open source framework. The system mainly includes data collection module, offline recommendation module, real-time recommendation module and integrated business module. The offline recommendation module uses the Alternating Least Squares (ALS) algorithm to train the implicit semantic model for predicting ratings, and the realtime recommendation module processes users' movie ratings in the visual interface in real time through Spark Streaming. Function and compatibility tests show that the proposed system can quickly and accurately complete movie recommendations based on users' behavior preferences, with high usability. |
Keywords: offline recommendation; movie recommendation; Spark |