软件工程

引用本文:

夏火松，李保国.基于Python的动态网页评价爬虫算法[J].软件工程,2016,19(2):43-46.【点击复制】

【打印本页】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】

←前一篇|后一篇→

过刊浏览

分享到：微信更多

基于Python的动态网页评价爬虫算法

夏火松，李保国

（武汉纺织大学管理学院，湖北武汉 430073）

摘要: 在大数据获取中面临着如何采集动态评论网页的问题，这篇论文使用静态网页信息构造动态链接，提出了基于Python的动态网页评论爬虫算法。在此基础上实现了评论收集程序。最后将它与通用爬虫算法进行比较，证实了该算法具有针对性强、数据采集速度快、易嵌入开发、简单等优点，为不善于编程的新闻、文学、管理等学科的研究者提供了快速获取评论信息的方法。

关键词: Python语言静态地址动态链接动态网页评论爬虫算法

中图分类号: TP312 文献标识码: A

基金项目: 国家自然科学基金资助项目(71571139)“大数据情景的outlier分析与异类知识管理模式研究”.

Crawler Algorithms of Dynamic Web Reviews Based on Python

XIA Huosong，LI Baoguo

( School of Management, Wuhan Textile University, Wuhan 430073, China)

Abstract: An issues in big data is:how to get a dynamic comment page? This paper uses information of static pages structure dynamic link and designs a crawler algorithm for dynamic web.On this basis,this paper implements a comment collector.Finally,this paper compares it with the general crawler algorithm.It is proved that this algorithm has the advantages of strong pertinence,fast data acquisition,easy to be embedded,simple and so on.It provides fast access to large data sources for researchers who are not proficient in programming.

Keywords: python language static address dynamic link dynamic web reviews reptile algorithm

用微信扫一扫