Unmanned Combat Aerial Vehicles (UCAVs) play a significant role in modern military conflicts as they can perform intelligence, surveillance, reconnaissance, and target acquisition missions while carrying aircraft ordnance like missiles and bombs, and Anti-Tank Guided Missiles (ATGMs). However, the increased use of UCAVs has also led to more advanced anti-UCAV solutions in air defense. The paper proposes a deep reinforcement learning approach for generating online missile-evading maneuvers for combat aerial vehicles. The problem is made complicated by the missile's 8 Mach speed and the aircraft's limited 2.5 Mach speed. The system employs Twin Delayed Deep Deterministic Policy Gradient(TD3), one of the most known deep reinforcement learning algorithms, to train an agent to make real-time decisions on the best evasion tactics in a complex combat environment. A two-term reward function is used, with sparse rewards at terminal states and continuous rewards through the geometry of the combat. Aileron, rudder, and elevator controls are given directly to the algorithm to ensure all potential escape maneuvers are visible. The proposed methodology achieved a 59% success rate in extensive simulations, demonstrating its potential to enhance aerial vehicles' combat capabilities.