In many human computer interaction experiments eye tracker camera and web camera audio and video data are collected at the same time while a participant watches a certain video. When it is necessary to identify the actions of the participant during the experiment (using the video) it is necessary to align the web camera and eye tracker camera data. Most studies, require comparison of behaviour of different people during the same part of the experiment so it is also necessary to align the camera data for different people. In this study, we devise a framework based on audio synchronization and image retrieval techniques for these two alignment tasks. Alignment and visualization of these different data types will enable the experimenters to analyze the user behaviour for the same participant and also to efficiently compare behaviours of different groups of people.