Deep Reinforcement Learning for Real-Time Trajectory Planning in UAV Networks
In Unmanned Aerial Vehicle (UAV)-enabled wireless powered sensor networks, a UAV can be employed to charge the ground sensors remotely via Wireless Power Transfer (WPT) and collect the sensory data. This paper focuses on trajectory planning of the UAV for aerial data collection and WPT to minimize buffer overflow at the ground sensors and unsuccessful transmission due to lossy airborne channels. Consider network states of battery levels and buffer lengths of the ground sensors, channel conditions, and location of the UAV. A flight trajectory planning optimization is formulated as a Partial Observable Markov Decision Process (POMDP), where the UAV has partial observation of the network states. In practice, the UAV-enabled sensor network contains a large number of network states and actions in POMDP while the up-to-date knowledge of the network states is not available at the UAV. To address these issues, we propose an onboard deep reinforcement learning algorithm to optimize the realtime trajectory planning of the UAV given outdated knowledge on the network states.
- Computer Science & Engineering [1669 items ]