摘 要: 为缓解目前的大数据流式计算引擎在处理密集窗口时因高负载而带来的性能下降问题,文章分析了原生窗口机制的性能瓶颈以及现有优化方法的不足之处,包括需要额外的内存空间用于存储输入的数据流、无法自动清理状态缓存等,提出一种基于关键窗口机制的优化方案,该方案能够减少流式计算中需要创建的窗口数量,具有降低系统负载的效果。通过与原生机制进行对比分析,证明此优化方案的有效性。该优化方案具有能兼容现有框架、对下游系统改造少及同时提升内存占用和I/O频率两个方面性能的优点。 |
关键词: 大数据;流式计算;窗口计算;Flink |
中图分类号: TP316.4
文献标识码: A
|
|
Research on Performance Optimization of Dense Sliding Windows in Streaming Computing Engines |
CHENG Shengyang
|
(China Unionpay, Shanghai 201201, China)
chengshengyang@unionpay.com
|
Abstract: In order to alleviate the performance drop caused by high load of current big data streaming computing engines when processing dense windows, this paper proposes to analyze the performance bottleneck of the native window mechanism. And the shortcomings of some existing optimization schemes are pointed out as well, including the need for additional memory space to store the input data stream, and the inability to automatically clean the state cache. Then, an optimization scheme based on key-window mechanism is proposed, which can reduce the number of windows to be created in streaming computation and therefore reduces the system load. The effectiveness of this optimization is shown by a comparative analysis with the native mechanism. This optimization scheme has the advantages of being compatible with existing frameworks, requiring little modification of downstream systems, and enhancing both memory and I/O performance. |
Keywords: big data; streaming computing; window computing; Flink |