摘 要: 针对蜜网系统易被攻击者通过时延特征进行识别的问题,提出一种基于集成学习的业务网络时延模拟算法。该方法首先采集业务服务所在局域网内的网络流量和时延信息,经数据预处理后,得到数据集。其次,基于Stacking集成学习方法,以随机森林为元学习器,将Boosting簇三种模型作为初级学习器进行预测,预测结果经融合后作为时延预测的基准值。接着,以分段回归树为模型预测时延抖动特征。最后,将时延基准和抖动特征叠加,得到符合局域网时延抖动特性的综合时延模型,基于该模型实现蜜网系统时延模拟,从而降低被攻击者识别概率。最终实验结果表明,与GBDT、XGBoost和CatBoost算法相比,本文方法预测结果在MSE(Mean Square Error,均方误差)和MAPE(Mean Absolute Percentage Error,平均绝对百分比误差)上分别提升了35.5%和21.3%,在细节方面有较强表达能力。 |
关键词: 集成学习;网络时延;Boosting;Stacking;蜜网 |
中图分类号: TP181
文献标识码: A
|
基金项目: 浙江省基础公益研究计划项目(LGG20F020016). |
|
Business Network Delay Simulation based on Integrated Learning |
CHEN Rucong, ZHANG Huaxiong
|
(School of Information, Zhejiang Sci-Tech University, Hangzhou 310018, China)
735874531@qq.com; zhxhz@zstu.edu.cn
|
Abstract: Aiming at the problem that Honeynet systems are easy to be identified by attackers through delay characteristics, this paper proposes a business network delay simulation algorithm based on integrated learning. First, network traffic and delay information are collected in the local area network where the business service is located, and obtains a data set after data preprocessing. Secondly, based on Stacking integrated learning method, taking random forest as the meta-learner, three models of Boosting cluster are used as primary learner for prediction, and the prediction results are fused as the reference value for delay prediction. Then, segmented regression tree is used as a model to predict the delay jitter characteristics. Finally, the delay reference and jitter characteristics are superimposed to obtain a comprehensive delay model that conforms to the delay and jitter characteristics of the LAN (Local Area Network). Based on this model, the Honeynet system delay simulation is implemented, thereby reducing the probability of being identified by the attackers. Final experimental results of the thesis show that, compared with GBDT (Gradient Boosting Decision Tree), XGBoost and CatBoost algorithms, prediction results of the proposed method are improved by 35.5% and 21.3% in MSE (Mean Square Error) and MAPE (Mean Absolute Percentage Error) respectively, and they have strong expressive ability in details. |
Keywords: integrated learning; network delay; Boosting; Stacking; Honeynet |