SMART PATH PLANNING FOR MOBILE MULTI-TARGET VISITATION IN UAV MISSIONS IN STOCHASTIC AND ADVERSARIAL ENVIRONMENTS
الملخص
Search and rescue operations in challenging environments like disaster sites and dense forests are often highly costly in terms of time and human resources. These environments pose significant risks to human operators and can be difficult to navigate effectively. Autonomous Unmanned Aerial Vehicles (UAVs) offer a promising solution to these challenges. By leveraging their agility and autonomy, UAVs can perform dangerous tasks more safely and efficiently than human teams. This technology has the potential to revolutionize search and rescue efforts by reducing risks to human personnel while improving the speed and effectiveness of operations in hazardous or hard-to-reach areas. This thesis tackles a critical challenge in UAV path planning: efficiently visiting multiple mobile targets in complex, obstacle-filled environments. We first apply online learning algorithms tailored for UAV search missions, where the UAV is tasked with locating a mobile target formation under the assumption of an unknown and potentially non-stationary probability distribution. This algorithm learns the formation's strategy over time. To formalize this approach, we define an optimization problem and utilize the Exp3 algorithm (exponential-weighted exploration and exploitation) to derive a solution. To enhance the learning process, we integrate environmental observations as context, leading to a variant known as Contextual Exp3 (C-Exp3). However, C-Exp3 is limited in scenarios where the target formation strategy changes over time. To address this limitation, we propose Adaptive Contextual Exp3 (AC-Exp3) as a solution that incorporates a human-centric drift detection mechanism to identify changes in the formation's strategy and adjust the learning process accordingly. Additionally, we employ the Exp4 algorithm as a self-adjusting meta-learner to effectively respond to fluctuations in the formation's strategy. Following this, we introduce a Deep Deterministic Policy Gradient (DDPG) reinforcement learning framework that allows the UAV to learn the stochastic distribution of mobile targets, enabling it to determine the optimal path for visiting these targets while avoiding obstacles. We evaluate the performance of DDPG, C-Exp3, AC-Exp3, and Exp4 through a series of experiments focused on stochastic and non-stationary environments. Our primary objective is to approach the unknown optimal policy as time t nears the horizon T, thereby demonstrating the UAV's capacity to learn the formation's strategy. Our results show that AC-Exp3 exhibits enhanced adaptability in non-stationary environment compared to C-Exp3, while Exp4 proves to be a robust performer, swiftly adjusting to new strategies. Furthermore, increasing the number of targets significantly impacts the DDPG agent's performance and requires more training time. Additionally, extremely cluttered environments adversely affect the agent's performance and reduce the mission's success rate regarding target visitation.
DOI/handle
http://hdl.handle.net/10576/66581المجموعات
- الحوسبة [111 items ]