In recent years, machine learning (ML) techniques have gained popularity in facilitating real-time transient stability assessment and prediction for early detection of blackouts in power systems. Conventionally, synchrophasor measurements, as the real-time indicators of system dynamics, are fed into the ML-based models. However, if the quality of the synchrophasors used in the process of developing and application of the ML models is not validated, these models could suffer from unreliability issues due to the unrealistic quantities obtained through simulations and due to the interference of erroneous measurements encountered during their application. In this paper, after investigating the properties of different simulation methods, a new hybrid-type simulator that generates a realistic dataset in a feasible time is proposed. Using this simulator, the distortion of the time-series data due to the dynamics of practical phasor measurement units (PMUs) following a disturbance is analysed and the intervals in which the PMU measurements are significantly erroneous are determined. Moreover, a new method of time-series data arrangement for the dataset to be used in ML models is proposed. With this method, the erroneous parts of the time-series measurements are effectively removed, while the remaining relevant information is retained to enhance the transient stability prediction accuracy.