摘 要: 互联网的到来,使计算机行业蓬勃发展,各公司的业务数据也都到达P级别的数据量。本文结合Hadoop
框架的中Hive和Hbase,对各个模块进行了详细的描述,重点分析了集群搭建步骤,及如何对集群的数据进行采集和清
洗,并通过建立表来存储分析结果。 |
关键词: 海量数据;Hadoop;hive;数据采集;数据清洗 |
中图分类号: TP311
文献标识码: A
|
|
Data Acquisition and Data Cleaning Based on the Hadoop Cluster |
LIU Chen,JIAO Hejun1,2
|
1.( 1.Unit 71320, Kaifeng 475000, China;2.School of Computer Science, Henan University of Engineering, Zhengzhou 451191, China)
|
Abstract: With the flourishing development of computer industry,the business data in enterprises has reached level-P.
Based on Hive and Hbase in the Hadoop framework,this paper elaborates on each module and analyzes the process of cluster
construction, data acquisition,data cleaning and table construction to store analysis results. |
Keywords: mass data;Hadoop;hive;data acquisition;data cleaning |