Title :
Effects of feature type, learning algorithm and speaking style for depression detection from speech
Author :
Mitra, Vikramjit ; Shriberg, Elizabeth
Author_Institution :
SRI Int., Menlo Park, CA, USA
Abstract :
Computational methods for speech-based detection of depression are still relatively new, and have focused on either a standard set of features or on specific additional approaches. We systematically study the effects of feature type, machine learning approach, and speaking style (read versus spontaneous) on depression prediction in the AVEC-2014 evaluation corpus, using features related to speech production, perception, acoustic phonetics, and prosody. Using a multilayer ANN we find that one feature type, MMEDuSA [2], results in a 25% relative error reduction over the AVEC-2014 baseline system [1] for both mean absolute error (MAE) and root mean squared error (RMSE). Other individual feature types perform comparably to the baseline, but have much lower dimensionality and simpler to interpret. Further improvements were achieved from fusing diverse features and systems. Finally, results suggest that the relative contribution of different feature types depends on whether the speech is spontaneous or read. Overall, spontaneous speech led to lower error rates than read speech, an important consideration for the collection of future clinical data.
Keywords :
behavioural sciences computing; learning (artificial intelligence); medical signal detection; medical signal processing; multilayer perceptrons; speech processing; statistical analysis; AVEC-2014 evaluation corpus; MAE; MMEDuSA; RMSE; acoustic phonetics; computational methods; depression prediction; feature type; learning algorithm; machine learning approach; mean absolute error; multilayer ANN; prosody; read type; relative error reduction; root mean squared error; speaking style; speech production; speech-based depression detection; spontaneous type; Artificial neural networks; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech recognition; Depression detection; acoustic features; articulatory features; clinical data; neural networks; prosody; robust signal analysis;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178877