软件工程

引用本文:

刘晨，焦合军.基于HADOOP集群的数据采集和清洗[J].软件工程,2016,19(11):20-24.【点击复制】

【打印本页】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】

←前一篇|后一篇→

过刊浏览

分享到：微信更多

基于HADOOP集群的数据采集和清洗

刘晨¹，焦合军²

（1.71320部队，河南开封 475000
2.河南工程学院计算机学院，河南郑州 451191）

摘要: 互联网的到来，使计算机行业蓬勃发展，各公司的业务数据也都到达P级别的数据量。本文结合Hadoop 框架的中Hive和Hbase，对各个模块进行了详细的描述，重点分析了集群搭建步骤，及如何对集群的数据进行采集和清洗，并通过建立表来存储分析结果。

关键词: 海量数据；Hadoop；hive；数据采集；数据清洗

中图分类号: TP311 文献标识码: A

Data Acquisition and Data Cleaning Based on the Hadoop Cluster

LIU Chen，JIAO Hejun^1,2

1.( 1.Unit 71320, Kaifeng 475000, China;2.School of Computer Science, Henan University of Engineering, Zhengzhou 451191, China)

Abstract: With the flourishing development of computer industry,the business data in enterprises has reached level-P. Based on Hive and Hbase in the Hadoop framework,this paper elaborates on each module and analyzes the process of cluster construction, data acquisition,data cleaning and table construction to store analysis results.

Keywords: mass data;Hadoop;hive;data acquisition;data cleaning

用微信扫一扫