Remote sensing scene classification is a critical task in computer vision, which involves categorizing land areas into predefined classes based on very high-resolution remotely sensed data. Deep learning architectures such as classical convolutional and residual neural networks as well as relatively new attention-based networks, have shown great potential in achieving high accuracy in remote sensing scene classification tasks. With the increasing availability of remote sensing data and the advancements in deep learning techniques, modern deep learning architectures such as ConvNeXt and vision transformers have shown tremendous potential in achieving high accuracy in this task. In this paper, we present a comprehensive evaluation of modern deep-learning architectures for remote sensing scene classification. Preliminary experiments showed that the models from the ResNet family are better than modern networks in fulfilling the tradeoff between accuracy and speed.