This paper describes an approach for mobile robot localization using a visual word based place recognition approach. In our approach we exploit the benefits of a stereo camera system for place recognition. Visual words computed from SIFT features are combined with VIP (viewpoint invariant patches) features that use depth information from the stereo setup. The approach was evaluated under the ImageCLEF@ICPR 2010 competition(1). The results achieved on the competition datasets are published in this paper.