In this study, a high-order compact finite difference scheme for the solution of fluid flow problems is implemented to run on a Graphical Processing Unit (CPU) using Compute Unified Device Architecture (CUDA). Besides the compact scheme, a high-order low pass filter is also employed. For time integration, the classical fourth-order Runge-Kutta method is used. Advection of a vortical disturbance and a temporal mixing layer, two basic flows, are chosen for the application of this numerical method on a Tesla C1060, one of NVIDIA's scientific computing GPUs. Obtained results are compared with those obtained on a single core CPU (AMD Phenom 2.5 GHz) in terms of calculation time. The CPU code exploits LAPACK/BLAS library to solve cyclic tridiagonal systems generated by the compact solution and filtering schemes, whereas the CPU code uses the inverse of the coefficient matrix to solve the same linear systems by utilizing the CUBLAS library. Moreover, the shared memory feature of the CPU is also employed to ease coalescing issues on some parts of the CPU code. Speedups between 9x-16.5x are achieved for different mesh sizes in comparison to CPU computations. (C) 2011 Elsevier Ltd. All rights reserved.