Parallel computing methods are very useful in speeding up algorithms that can be divided into independent subtasks. Traditional multi-processor architectures have limited use due to their high cost and difficulties of their use. Recently, Graphics Processor Units (GPUs) has opened up a new era for general purpose parallel computation. Among many GPU programming frameworks, Compute Unified Device Architecture (CUDA) seems to be the most widely used GPU architecture due to its low cost and ease of use. In this paper, we show how to implement our recently proposed novel edge segment detector, the Edge Drawing (ED) algorithm, in CUDA, and present performance studies demonstrating the performance gams in the CUDA architecture compared to a uniprocessor CPU implementation. The results show that a CUDA implementation improves the running time of ED by up to 12 and ED runs at an amazing blazing speed of about 1 ms on a 512512 image. ED is run on different CUDA cards and the performance results are presented. © 2012 IEEE.