摘 要: 在大数据获取中面临着如何采集动态评论网页的问题,这篇论文使用静态网页信息构造动态链接,提出了 基于Python的动态网页评论爬虫算法。在此基础上实现了评论收集程序。最后将它与通用爬虫算法进行比较,证实了 该算法具有针对性强、数据采集速度快、易嵌入开发、简单等优点,为不善于编程的新闻、文学、管理等学科的研究者 提供了快速获取评论信息的方法。 |
关键词: Python语言;静态地址;动态链接;动态网页评论;爬虫算法 |
中图分类号: TP312
文献标识码: A
|
基金项目: 国家自然科学基金资助项目(71571139)“大数据情景的outlier分析与异类知识管理模式研究”. |
|
Crawler Algorithms of Dynamic Web Reviews Based on Python |
XIA Huosong,LI Baoguo
|
( School of Management, Wuhan Textile University, Wuhan 430073, China)
|
Abstract: An issues in big data is:how to get a dynamic comment page? This paper uses information of static pages structure dynamic link and designs a crawler algorithm for dynamic web.On this basis,this paper implements a comment collector.Finally,this paper compares it with the general crawler algorithm.It is proved that this algorithm has the advantages of strong pertinence,fast data acquisition,easy to be embedded,simple and so on.It provides fast access to large data sources for researchers who are not proficient in programming. |
Keywords: python language;static address;dynamic link;dynamic web reviews;reptile algorithm |