Finite element method (FEM) is a well-developed method to solve real-world problems that can be modeled with differential equations. As the available computational power increases, complex and large-size problems can be solved using FEM, which typically involves multiple degrees of freedom (DOF) per node, high order of elements, and an iterative solver requiring several sparse matrix-vector multiplication operations. In this work, a new storage scheme is proposed for sparse matrices arising from FEM simulations with multiple DOF per node. A sparse matrix-vector multiplication kernel and its variants using the proposed scheme are also given for CUDA-enabled GPUs. The proposed scheme and the kernels rely on the mesh connectivity data from FEM discretization and the number of DOF per node. The proposed kernel performance was evaluated on seven test matrices for double-precision floating point operations. The performance analysis showed that the proposed GPU kernel outperforms the ELLPACK (ELL) and CUSPARSE Hybrid (HYB) format GPU kernels by an average of 42% and 32%, respectively, on a Tesla K20c card. Copyright (c) 2016 John Wiley & Sons, Ltd.