Online Speaker Emotion Tracking with a Dynamic State Transition Model

23rd International Conference on Pattern Recognition (ICPR), Cancun, Meksika, 4 - 08 Aralık 2016, ss.307-312

Yayın Türü: Bildiri / Tam Metin Bildiri
Basıldığı Şehir: Cancun
Basıldığı Ülke: Meksika
Sayfa Sayıları: ss.307-312
İstanbul Teknik Üniversitesi Adresli: Evet

Özet

Although emotional state recognition from voice has been extensively studied, there is not much effort focusing on the online emotion recognition. Since duration and intensity of emotional experiences change over time it is hard to employ existing static transition models while monitoring emotional states especially in an online setting. To overcome this difficulty we introduce a method which incorporates particle filter tracking for switching observation models with emotional state classification. Adopting the Active Field State Space (AFSS) used in modeling human social interactions, a dynamic state transition model is formulated in the continuous arousal-valence-stance space. Under the assumption that the target posterior of each emotional state is a GMM with unknown number of mixture components, the observation model is constructed throughout a training scheme where DPM models of the emotional states are learned via SMC sampling. Online speaker emotional state labeling performance of the proposed method has been evaluated on long speech sequences containing emotional drift and transitions. Test sequences are simulated from EmoDB based on the AFSS interaction model. It is shown that formulating the emotional state classification as an online tracking problem provides a considerable improvement over standard maximum likelihood classification approach. Test results demonstrate that the introduced method achieves 83% accuracy in an online setting which is comparable to the performance of existing offline methods.