软件工程

引用本文:

胡瑞娟.大数据架构下的热词发现与可视化技术研究[J].软件工程,2018,21(5):1-3.【点击复制】

【打印本页】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】

←前一篇|后一篇→

过刊浏览

分享到：微信更多

大数据架构下的热词发现与可视化技术研究

胡瑞娟

(信息工程大学，河南郑州 450000)

摘要: 在大数据背景下，数据膨胀的速度已经远远超出了人工分析的能力范围，因此，如何在大数据时代构建热词发现与可视化机制尤为紧迫和重要。本文通过研究Hadoop大数据平台下的MapReduce计算框架和TF-IDF算法，给出了TF-IDF算法在Hadoop分布式并行化计算平台下的具体实现，并以此并行化算法作为大数据架构下热词发现技术的核心算法，然后利用可视化工具对结果进行分析处理。结果表明，TF-IDF并行化算法可以较好地发现大规模数据量中的热点词汇；与传统单机下的算法相比，该算法处理效率更高。

关键词: Hadoop TF-IDF并行化热词发现可视化

中图分类号: TP391 文献标识码: A

基金项目: 2017年度洛阳市社会科学规划项目，大数据架构下的热词发现与可视化技术研究.

Research on Hot Word Discovery and Visualization Technology Based on Big Data Architecture

HU Ruijuan

( Information Engineering University, Zhengzhou 450000, China)

Abstract: The speed of data expansion is far beyond the ability of artificial analysis in the era of big data.Therefore,it is particularly urgent and important how to build hot word discovery and visualization mechanism.By studying the MapReduce computing framework and TF-IDF algorithm under the Hadoop platform,this paper gives the concrete implementation of the TF-IDF algorithm under the Hadoop distributed parallel computing platform,and uses this parallel algorithm as the core algorithm of the hot word discovery technology based on the big data architecture,and then uses the visualization tool to display and analyze the results.The results show that the TF-IDF parallelization algorithm can find the hot words in large amount of data much better.Compared with traditional single-machine algorithms,this algorithm is more efficient.

Keywords: Hadoop TF-IDF parallelization hot word discovery visualization

用微信扫一扫