Video Action Classification by Deep Learning

Ergün E., Gurkan F., Kaplan O., Günsel Kalyoncu B.

25th Signal Processing and Communications Applications Conference (SIU), Antalya, Turkey, 15 - 18 May 2017 identifier

  • Publication Type: Conference Paper / Full Text
  • City: Antalya
  • Country: Turkey
  • Istanbul Technical University Affiliated: Yes


The purpose of this study is learning and classification of video activities using video color and motion information. The video activity labeling is important for many applications such as video content modeling, indexing, and quick access to content. In this study video activity recognition is performed by deep learning. In order to learn visual features of video, Convolutional Neural Network (CNN) lavers and a special type of recursive networks, Long-Short Term Memory (LSTM), lavers are stacked. Video seuuence learning is performed by end to-end training. Recent works on deen learning employ' color end motion information together to improve learning and classification accuracy. In this study, unlike the existing models, video motion content is learned using SIFT flow vectors and motion and color features are fused for activity recognition. Performance tests performed on a commonly used benchmarking data set, UCF 101 which includes activity labeled videos from 101 action categories such as ''Biking", "Playing Guitar," demonstrate that SIFT flow vectors allow us to model motion information more accurately than optical flow vectors and increase video motion classification performance.