We study a bio-medical fluid flow simulation using the incompressible, laminar OpenFOAM flow solver icoFoam using iterative linear equation solver and direct solvers (kernel class) such as SuperLU_DIST 3.3 and SuperLU_MCDT (Many-Core Distributed) for the large penta-diagonal and hepta-diagonal matrices coming from the simulation of blood flow in arteries with a structured mesh domain. A realistic simulation for the flow of blood in the heart or vessels in the whole body is a complex problem and may take a very long time, thousands of hours, for the main tasks such as pre-processing (meshing), decomposition and solving the large linear systems. Our aim is to test the potential scaling capability of the fluid solver for multi-petascale systems. We started from the relatively small instances for the whole simulation and solved large linear systems. We measured the wall clock time of single time steps of the simulation. This version gives important clues for a larger version of the problem. Later, we increase the problem size and the number of time steps to obtain a better picture gradually, in our general strategy. We test the performance of the solver icoFoam at TGCC Curie (a Tier-0 system) at CEA, France (see ). We achieved scaled speed-up for the largest matrices of 64 million million in our dataset to run up to 16,384 cores. In other words, we find that the scalability improves as the problem size increases for this application. As the matrix size quadrupled, the speed-up improves at least 50 % near speed-up saturation point. This shows that there is no structural problem in the software up to this scale. This is an important and encouraging result for the problem. Moreover, we imbedded other direct solvers (kernel class) such as SuperLU_DIST 3.3 and SuperLU_MCDT in addition to the solvers provided by OpenFOAM. Since future exascale systems are expected to have heterogeneous and many-core distributed nodes, we believe that our SuperLU_MCDT software is a good candidate for future systems. SuperLU_MCDT worked up to 16,384 cores for the large penta-diagonal matrices for 2D problems and hepta-diagonal matrices for 3D problems, coming from the incompressible blood flow simulation, without any problem.