摘 要: 对校园大数据分析是校园信息化发展的新思路。Hadoop是Apache基金会开发的分布式系统基础架构, 它是集分布式计算、存储和管理为一体的生态系统。目前流行的Spark框架是与Hadoop生态系统中的MapReduce类似 的一个分布式计算平台,Spark比MapReduce的速度更快且提供的功能更丰富。本文以数据采集、数据存储、数据分 析、数据展现为主线,结合大数据领域最流行的Hadoop框架与Spark框架提出了高校校园大数据平台架构,详细阐述了 架构各层次的具体功能,并对架构中关系数据库数据的采集存储进行了详细介绍,最后设计校园大数据分析原型系统来 验证架构的可行性。 |
关键词: 大数据;Hadoop;Spark;校园大数据平台 |
中图分类号: TP391
文献标识码: A
|
基金项目: 江阴职业技术学院课题 “基于Spark的大数据处理平台的构建及研究”(17E-JS-25);江苏省软件与服务外包实训基地子课题“基于Spark的大数据体验系统的创新 应用实践”(2017-PPZY-A-R-19). |
|
Research on University Campus Big Data Platform Based on Hadoop and Spark |
LIU Ping
|
( Department of Computer Science, Jiangyin Polytechnic College, Jiangyin 214400, China)
|
Abstract: The analysis of campus big data is a new way of campus information development.Hadoop is a distributed system infrastructure developed by Apache Foundation,which is an ecosystem integrating distributed computing,storage and management.The current popular Spark framework is a distributed computing platform similar to MapReduce in the Hadoop ecosystem,and Spark is faster and more functional than MapReduce.With the main line of data collection,data storage,data analysis and data presentation,this paper puts forward the big data platform architecture of university campus in combination with the most popular Hadoop framework and Spark framework in big data fields,and expounds the specific functions of the architecture at all levels in detail,and gives a detailed description of the data collection and storage of the related coefficients in the architecture.Finally,the campus big data analysis prototype system is designed to verify the feasibility of the architecture. |
Keywords: big data;Hadoop;Spark;campus big data platform |