Performance of an algorithm mainly depends on both computer architecture and software. An Intel Xeon processor based HPC cluster and Intel Itanium2 based symmetric multiprocessing (SNIP) architectures are used for performance analysis of PDE based parallel algorithm. Algorithm is parallelized using MPI and performance measurements are done using Tuning and Analysis Utilities (TAU). Computational optimization reveals data independency and helps compiler to generate more efficient program for that specific processor. Removing data dependency inside loop is the key in this work. In iterative algorithms, like Gauss-Seidel method, each processor communicates with the same processors at every iteration. This feature makes persistent connection preferable.