We propose a visual object tracker that improves accuracy while significantly decreasing false alarm rate. This is achieved by a late fusion scheme that integrates the motion model of particle sampling with the region proposal network of Mask R-CNN during inference. The qualified bounding boxes selected by the late fusion are fed into the Mask R-CNN's head layer for the detection of the tracked object. We refer the introduced scheme, TAVOT, as target aware visual object tracker since it is capable of minimizing false detections with the guidance of variable rate particle sampling initialized by the target region of interest. It is shown that TAVOT is capable of modeling temporal video content with a simple motion model thus constitutes a promising video object tracker. Performance evaluation performed on VOT2016 video sequences demonstrates that TAVOT 22% increases the success rate, while 73% decreasing the false alarm rate compared to the baseline Mask R-CNN. Compared to the top tracker of VOT2016 around 5% increase at the success rate is reported where intersection over union is greater than 0.5.