| 1,066 | 0 | 31 |
| 下载次数 | 被引频次 | 阅读次数 |
针对复杂环境下的多无人机(unmanned aerial vehicle,UAV)自主避碰问题,采用近端策略优化(proximal policy optimization,PPO)算法对多无人机协同避碰策略展开研究。首先,为解决多无人机避碰过程中协同性差问题,通过引入长短期记忆网络(long short-term memory,LSTM)构建记忆功能,设计了带有卷积神经网络(convolutional neural network,CNN)和LSTM网络的CNN-LSTM融合网络。该融合网络充分利用了CNN网络在特征提取方面的能力和LSTM网络在处理序列数据方面的优势,提高了无人机决策的稳健性。其次,基于人工势场奖励塑形技术设计了新颖的奖励函数。通过设计一个主线奖励与若干辅助奖励,有效避免了奖励稀疏化。主线奖励用于引导无人机朝着预定目标运动;辅助奖励鼓励无人机采取更加灵活和稳健的动作,从而在复杂环境中实现更有效的自主避碰。最后,通过开展多组对比实验与高密度无人机环境下鲁棒性测试,结果表明所提算法在奖励回报和避碰成功率方面均展现出显著优势,具备良好的环境适应性。
Abstract:Aiming at the autonomous collision avoidance problem of multi-unmanned aerial vehicles(multi-UAVs) in complex environments, the proximal policy optimization(PPO) algorithm is adopted to investigate the autonomous collision avoidance strategy for multi-UAVs. Firstly, to address the problem of poor coordination during multi-UAVs collision avoidance, a CNN-LSTM fusion network incorporating convolutional neural network(CNN) and long shortterm memory(LSTM) network is designed by introducing LSTM to construct memory functionality. The fusion network fully utilizes the capability of CNN in feature extraction and the advantages of LSTM in processing sequential data, thereby improving the robustness of UAV decision-making. Secondly, a novel reward function is designed based on artificial potential field reward shaping technique. By designing a main reward combined with several auxiliary rewards, reward sparsity is effectively avoided. The mainline reward guides the UAV toward the predetermined target,while the auxiliary rewards encourage the UAV to take more flexible and robust actions, thus achieving more effective autonomous collision avoidance in complex environments. Finally, the algorithm is validated through simulations in both obstacle-free and obstacle environments. Simulation results demonstrate that the CLPPO algorithm based on CL fusion network effectively improves algorithm performance. In addition, comparative experiments and robustness tests under high-density UAV environments are conducted. The test results show that the algorithm exhibits excellent performance in terms of return and collision avoidance success rate.
[1]JU C, SON H I. Multiple UAV systems for agricultural applications:control, implementation, and evaluation[J]. Electronics, 2018, 7(9):162.
[2]LI F Q, ZHANG K Y, WANG J H, et al. Multi-UAV hierarchical intelligent traffic offloading network optimization based on deep federated learning[J]. IEEE Internet of Things Journal, 2024, 11(12):21312-21324.
[3]HU Q L, ZHANG J. Relative position finite-time coordinated tracking control of spacecraft formation without velocity measurements[J]. ISA Transactions, 2015, 54:60-74.
[4]LIU C, LIU L, CAO J D, et al. Intermittent event-triggered optimal leader-following consensus for nonlinear multi-agent systems via actor-critic algorithm[J]. IEEE Transactions on Neural Networks and Learning Systems,2023, 34(8):3992-4006.
[5]DONG X W, LI Y F, LU C, et al. Time-varying formation tracking for UAV swarm systems with switching directed topologies[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(12):3674-3685.
[6]苏牧青,王寅,濮锐敏,等.基于强化学习的多无人车协同围捕方法[J].工程科学学报,2024, 46(7):1237-1250.SU M Q, WANG Y, PU R M, et al. Cooperative encirclement method for multiple unmanned ground vehicles based on reinforcement learning[J]. Chinese Journal of Engineering, 2024, 46(7):1237-1250.(in Chinese)
[7]JIANG Y , XU X X , ZHENG M Y , et al. Evolutionary computation for unmanned aerial vehicle path planning:a survey[J]. Artificial Intelligence Review, 2024, 57(10):267.
[8]HUANG T P, HUANG D Q, QIN N, et al. Path planning and control of a quadrotor UAV based on an improved APF using parallel search[J]. International Journal of Aerospace Engineering, 2021, 2021:5524841.
[9]HE Y, HOU T C, WANG M R. A new method for unmanned aerial vehicle path planning in complex environments[J]. Scientific Reports, 2024, 14(1):9257.
[10]SUN J Y, TANG J, LAO S Y. Collision avoidance for cooperative UAVs with optimized artificial potential field algorithm[J]. IEEE Access, 2017, 5:18382-18390.
[11]LI Y J, AGHVAMI A H, DONG D Y. Path planning for cellular-connected UAV:a DRL solution with quantum-inspired experience replay[J]. IEEE Transactions on Wireless Communications, 2022, 21(10):7897-7912.
[12]KEONG C W, SHIN H S, TSOURDOS A. Reinforcement learning for autonomous aircraft avoidance[C]//Proceedings of the 2019 Workshop on Research, Education and Development of Unmanned Aerial Systems(RED UAS), November25-27, 2019, Cranfield, UK. New York:IEEE Xplore,2019:126-131.
[13]XU C, ZHANG P F, YU H B, et al. D3QN-based multipriority computation offloading for time-sensitive and interference-limited industrial wireless networks[J]. IEEE Transactions on Vehicular Technology , 2024 , 73(9):13682-13693.
[14]YU X Q, WANG P, ZHANG Z X. Learning-based endto-end path planning for lunar rovers with safety constraints[J]. Sensors, 2021, 21(3):796.
[15]LIU A L, LIU L, CAO J D, et al. Deep deterministic policy gradient with generalized integral compensator for height control of quadrotor[J]. Journal of Applied Analysis&Computation, 2022, 12(3):868-894.
[16]符甲鑫,刘磊,钱成.基于注意力机制的A3C量化交易策略[J].南通大学学报(自然科学版),2023, 22(2):43-49.FU J X, LIU L, QIAN C. A3C quantitative trading strategy based on attention[J]. Journal of Nantong University(Natural Science Edition), 2023, 22(2):43-49.(in Chinese)
[17]HUANG X X, WANG W, JI Z K, et al. Representation enhancement-based proximal policy optimization for UAV path planning and obstacle avoidance[J]. International Journal of Aerospace Engineering, 2023, 2023(1):6654130.
[18]XU L, ZHANG X M, XIAO D, et al. Research on heterogeneous multi-UAV collaborative decision-making method based on improved PPO[J]. Applied Intelligence , 2024,54(20):9892-9905.
[19]LONG P X, FAN T X, LIAO X Y, et al. Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning[C]//Proceedings of the 2018IEEE International Conference on Robotics and Automation(ICRA), May 21-25, 2018, Brisbane, QLD, Australia. New York:IEEE Xplore, 2018:6252-6259.
[20]ZHAO E Y, ZHOU N, LIU C J, et al. Time-aware MADDPG with LSTM for multi-agent obstacle avoidance:a comparative study[J]. Complex&Intelligent Systems, 2024, 10(3):4141-4155.
[21]黄号,马文卉,李家诚,等.未知环境下无人机编队智能避障控制方法[J].清华大学学报(自然科学版),2024,64(2):358-369.HUANG H, MA W H, LI J C, et al. Intelligent obstacle avoidance control method for unmanned aerial vehicle formations in unknown environments[J]. Journal of Tsinghua University(Science and Technology), 2024, 64(2):358-369.(in Chinese)
[22]TANG W B, ZHOU Y, LIU Y, et al. Robust motion planning for multi-robot systems against position deception attacks[J]. IEEE Transactions on Information Forensics and Security, 2024, 19:2157-2170.
[23]LIANG C Q, LIU L, LIU C. Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNNLSTM fusion network[J]. Neural Networks, 2023, 162:21-33.
[24]XUE Y T, CHEN W S. Multi-agent deep reinforcement learning for UAVs navigation in unknown complex environment[J]. IEEE Transactions on Intelligent Vehicles,2024, 9(1):2290-2303.
基本信息:
DOI:10.12194/j.ntu.20241112003
中图分类号:TP18;V279;V249
引用信息:
[1]梁成庆,李蕾,刘磊.基于CNN-LSTM融合网络的深度强化学习多无人机自主避碰[J].南通大学学报(自然科学版),2025,24(04):1-9+20.DOI:10.12194/j.ntu.20241112003.
基金信息:
江苏省研究生科研与实践创新计划项目(KYCX24-0836); 教育部重点实验室开放基金项目(Scip20240111); 中央高校基本科研业务费项目(B240203012)