摘 要: 为了优化区域交通信号配时方案,提升区域通行效率,文章提出一种基于改进多智能体Nash Q Learning的区域交通信号协调控制方法。首先,采用离散化编码方法,通过划分单元格将连续状态信息转化为离散形式。其次,在算法中融入长短时记忆网络(Long Short Term Memory,LSTM)模块,用于从状态数据中挖掘更多的隐藏信息,丰富Q值表中的状态数据。最后,基于微观交通仿真软件SUMO(Simulation of Urban Mobility)的仿真测试结果表明,相较于原始Nash Q Learning交通信号控制方法,所提方法在低、中、高流量下车辆的平均等待时间分别减少了11.5%、16.2%和10.0%,平均排队长度分别减少了9.1%、8.2%和7.6%,平均停车次数分别减少了18.3%、16.1%和10.0%。结果证明了该算法具有更好的控制效果。 |
关键词: 区域交通信号协调控制;马尔科夫决策;多智能体Nash Q Learning;LSTM;SUMO |
中图分类号: TP181
文献标识码: A
|
基金项目: 国家自然科学基金资助项目(61603154);浙江省自然科学基金资助项目(LTGS23F030002);嘉兴市应用性基础研究项目(2023AY11034);工业控制技术国家重点实验室开放课题(ICT2022B52) |
|
Traffic Signal Coordination Control Based on Improved Multi-Agent Nash Q Learning |
SU Gang1,2, YE Baolin2, YAO Qing1, CHEN Bin2, ZHANG Yijia1
|
(1.School of In formation Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China; 2.Jiaxing Key Laboratory of Smart Transportations, Jiaxing University, Jiaxing 314001, China)
1018478742@qq.com; yebaolin@zjxu.edu.cn; q-yao@zstu.edu.cn; chenbin@zjxu.edu.cn; waiting@zstu.edu.cn
|
Abstract: In order to optimize the coordination timing scheme of regional traffic signals and improve traffic efficiency, this paper proposes a regional traffic signal coordination control method based on an improved multi-agent Nash Q Learning. First, a discretization coding method is employed to convert continuous state information into a discrete form by dividing it into cells. Second, a Long Short Term Memory (LSTM) module is incorporated into the algorithm to mine more hidden information from state data and enrich the state data in the Q value table. Finally, simulation tests based on the microscopic traffic simulation software SUMO (Simulation of Urban Mobility) show that, compared to the original Nash Q Learning traffic signal control method, the proposed method reduces the average waiting time for vehicles by 11.5% , 16.2% , and 10.0% under low, medium, and high traffic flows, respectively. It also decreases the average queue length by 9.1% , 8.2% , and 7.6% , and reduces the average number of stops by 18.3% , 16.1% ,and 10.0% . The results demonstrate that this algorithm achieves better control performance. |
Keywords: regional traffic signal coordination control; Markov decision; mult-i agent Nash Q Learning; LSTM; SUMO |