Automatic road extraction from historical maps is an important task to understand past transportation conditions and conduct spatio-temporal analysis revealing information about historical events and human activities over the years. This research aimed to propose the ideal architecture, encoder and hyperparameter settings for the historical road extraction task. We used a dataset including 7076 patches with the size of 256 x 256 pixels generated from scanned historical Deutsche Heereskarte 1:200,000 Türkei (DHK 200 Turkey) maps and their corresponding digitized ground truth masks for five different roads types. We first tested the widely used Unet++ and Deeplabv3 architectures. We also evaluated the contribution of attention models by implementing Unet++ with the concurrent Spatial and Channel -Squeeze & Excitation block and Multi-scale Attention Net. We achieved the best results with Split-Attention Network (Timm-ResNest200e) encoder and Unet++ architecture, with 98.99 % overall accuracy, 41.99 % intersection of union, 51.41 % precision, 69.7 % recall and 57.72,% F1 score values. Our output weights could be directly used for the inference of other DHK maps and transfer learning for similar or different historical maps. The proposed architecture could also be implemented in different road extraction studies.