Online Speaker Emotion Tracking with a Dynamic State Transition Model

Cirakman O., Günsel Kalyoncu B.

23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4 - 08 December 2016, pp.307-312 identifier

  • Publication Type: Conference Paper / Full Text
  • City: Cancun
  • Country: Mexico
  • Page Numbers: pp.307-312
  • Istanbul Technical University Affiliated: Yes


Although emotional state recognition from voice has been extensively studied, there is not much effort focusing on the online emotion recognition. Since duration and intensity of emotional experiences change over time it is hard to employ existing static transition models while monitoring emotional states especially in an online setting. To overcome this difficulty we introduce a method which incorporates particle filter tracking for switching observation models with emotional state classification. Adopting the Active Field State Space (AFSS) used in modeling human social interactions, a dynamic state transition model is formulated in the continuous arousal-valence-stance space. Under the assumption that the target posterior of each emotional state is a GMM with unknown number of mixture components, the observation model is constructed throughout a training scheme where DPM models of the emotional states are learned via SMC sampling. Online speaker emotional state labeling performance of the proposed method has been evaluated on long speech sequences containing emotional drift and transitions. Test sequences are simulated from EmoDB based on the AFSS interaction model. It is shown that formulating the emotional state classification as an online tracking problem provides a considerable improvement over standard maximum likelihood classification approach. Test results demonstrate that the introduced method achieves 83% accuracy in an online setting which is comparable to the performance of existing offline methods.