Medium term speaker state detection by perceptually masked spectral features


SPEECH COMMUNICATION, vol.67, pp.26-41, 2015 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 67
  • Publication Date: 2015
  • Doi Number: 10.1016/j.specom.2014.09.002
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.26-41
  • Istanbul Technical University Affiliated: No


We propose a method based on perceptual prosodic features for medium term speaker state classification, particularly sleepiness detection. Unlike existing methods, our features represent spectral characteristics of speech in perceptual bands and also track temporal content omitting any linguistic segmentation. Despite conventional methods, we aim to model transitions between non-sleepy and sleepy modes rather than emotional states. Along with the proposed compact feature set, the developed system enable discrimination of medium term speaker states with a lower complexity compared to existing systems. This is achieved by constructing a dictionary for speech data based on bag-of-words concept. It has been identified that a training setup which is based on learned codewords, yields a robust classifier for sleepy speech. The speaker state classification has been performed by applying a two-class classification scheme on the observed test data. The numerical results, obtained on the Sleepy Language Corpus (SLC) by using Support Vector Machines (SVM) classifier, demonstrate a 10% improvement on average on unweighted recall rates compared to the benchmarking results. The introduced method is promising for online applications because of its frame based feature extraction scheme which differs from conventional segmental descriptor extraction techniques. (C) 2014 Elsevier B.V. All rights reserved.