In this study, we present a reinforcement learning (RL)-based flight control system design method to improve the transient response performance of a closed-loop reference model (CRM) adaptive control system. The methodology, known as RL-CRM, relies on the generation of a dynamic adaption strategy by implementing RL on the variable factor in the feedback path gain matrix of the reference model. An actor-critic RL agent is designed using the performance-driven reward functions and tracking error observations from the environment. In the training phase, a deep deterministic policy gradient algorithm is utilized to learn the time-varying adaptation strategy of the design parameter in the reference model feedback gain matrix. The proposed control structure provides the possibility to learn numerous adaptation strategies across a wide range of flight and vehicle conditions instead of being driven by high-fidelity simulators or flight testing and real flight operations. The performance of the proposed system was evaluated on an identified and verified mathematical model of an agile quadrotor platform. Monte-Carlo simulations and worst case analysis were also performed over a benchmark helicopter example model. In comparison to the classical model reference adaptive control and CRM-adaptive control system designs, the proposed RL-CRM adaptive flight control system design improves the transient response performance on all associated metrics and provides the capability to operate over a wide range of parametric uncertainties.