Araştırma Çıktıları | WoS | Scopus | TR-Dizin | PubMed
Permanent URI for this communityhttps://hdl.handle.net/20.500.14719/1741
Browse
3 results
Search Results
Publication Open Access Use of line spectral frequencies for emotion recognition from speech(2010) Bozkurt, Elif; Erzin, Engin; Eroğlu Erdem, Çiğdem; Erdem, Tanju; Bozkurt, Elif, Koç Üniversitesi, Istanbul, Turkey; Erzin, Engin, Koç Üniversitesi, Istanbul, Turkey; Erdem, Cigdem Eroglu, Bahçeşehir Üniversitesi, Istanbul, Turkey; Erdem, Tanju Tanju, Özyeğin Üniversitesi, Istanbul, TurkeyWe propose the use of the line spectral frequency (LSF) features for emotion recognition from speech, which have not been been previously employed for emotion recognition to the best of our knowledge. Spectral features such as mel-scaled cepstral coefficients have already been successfully used for the parameterization of speech signals for emotion recognition. The LSF features also offer a spectral representation for speech, moreover they carry intrinsic information on the formant structure as well, which are related to the emotional state of the speaker [4]. We use the Gaussian mixture model (GMM) classifier architecture, that captures the static color of the spectral features. Experimental studies performed over the Berlin Emotional Speech Database and the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF features bring a consistent improvement over the MFCC based emotion classification rates. © 2010 IEEE. © 2010 Elsevier B.V., All rights reserved.Publication Metadata only INTERSPEECH 2009 emotion recognition challenge evaluation, INTERSPEECH 2009 duygu tanima yarişmasi deǧerlendirmesi(2010) Bozkurt, Elif; Erzin, Engin; Erdem, Cigdem Eroglu; Erdem, Tanju Tanju; Bozkurt, Elif, Koç Üniversitesi, Istanbul, Turkey; Erzin, Engin, Koç Üniversitesi, Istanbul, Turkey; Erdem, Cigdem Eroglu, Bahçeşehir Üniversitesi, Istanbul, Turkey; Erdem, Tanju Tanju, Özyeğin Üniversitesi, Istanbul, TurkeyIn this paper we evaluate INTERSPEECH 2009 Emotion Recognition Challenge results. The challenge presents the problem of accurate classification of natural and emotionally rich FAU Aibo recordings into five and two emotion classes. We evaluate prosody related, spectral and HMM-based features with Gaussian mixture model (GMM) classifiers to attack this problem. Spectral features consist of mel-scale cepstral coefficients (MFCC), line spectral frequency (LSF) features and their derivatives, whereas prosody-related features consist of pitch, first derivative of pitch and intensity. We employ unsupervised training of HMM structures with prosody related temporal features to define HMM-based features. We also investigate data fusion of different features and decision fusion of different classifiers to improve emotion recognition results. Our two-stage decision fusion method achieves 41.59 % and 67.90 % recall rate for the five and two-class problems, respectively and takes second and fourth place among the overall challenge results. ©2010 IEEE. © 2011 Elsevier B.V., All rights reserved.Publication Metadata only RANSAC-based training data selection on spectral features for emotion recognition from spontaneous speech(2011) Bozkurt, Elif; Erzin, Engin; Erdem, Cigdem Eroglu; Erdem, Tanju Tanju; Bozkurt, Elif, College of Engineering, Koç Üniversitesi, Istanbul, Turkey; Erzin, Engin, College of Engineering, Koç Üniversitesi, Istanbul, Turkey; Erdem, Cigdem Eroglu, Department of Electrical and Electronic Engineering, Bahçeşehir Üniversitesi, Istanbul, Turkey; Erdem, Tanju Tanju, Department of Electrical and Electronic Engineering, Özyeğin Üniversitesi, Istanbul, TurkeyTraining datasets containing spontaneous emotional speech are often imperfect due the ambiguities and difficulties of labeling such data by human observers. In this paper, we present a Random Sampling Consensus (RANSAC) based training approach for the problem of emotion recognition from spontaneous speech recordings. Our motivation is to insert a data cleaning process to the training phase of the Hidden Markov Models (HMMs) for the purpose of removing some suspicious instances of labels that may exist in the training dataset. Our experiments using HMMs with Mel Frequency Cepstral Coefficients (MFCC) and Line Spectral Frequency (LSF) features indicate that utilization of RANSAC in the training phase provides an improvement in the unweighted recall rates on the test set. Experimental studies performed over the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF and MFCC based classifiers provide further significant performance improvements. © 2011 Springer-Verlag. © 2011 Elsevier B.V., All rights reserved.
