In this paper, a scene interpretation system is proposed for cognitive robots to detect failures during their action executions. This system combines object recognition and segmentation results to maintain a consistent model of the world. Objects in the scene are recognized by using both color and depth information, and the unknown objects are segmented by using Euclidean clustering on the depth values. In addition to the locations of the objects, the world model includes some useful spatial relations for a tabletop object manipulation scenario: on, on__table, clear and near. The results of the conducted experiments by using the information gathered from the onboard RGB-D sensors of our Pioneer 3-AT and Pioneer 3-DX robots show that the proposed system can be successfully used to create a consistent world model including spatial relations in an object manipulation scenario.