Author :
Park, Chang-wun ; Lee, Dong-Wook ; Sim, Kwee-Bo
Abstract :
Emotion recognition has various methods. Mainly, it can be performed by visual or aural methods. We know that it is possible to recognize people´s emotion by only sound data. We use the pitch of speech as a main feature. We define features of four emotions (normal, angry, laugh, surprise) in pitch analysis. Based on this feature pattern, we implement a simulator using VC++. First of all, this simulator is composed of ´generation of individuals´, ´recurrent neural net (RNN)´, and ´evaluation´. Using the result from the learning part of this simulator, we can get results applied to other speech data (excepting for learning data). In detail, each module uses the following method. First, the generation of individuals part uses (1+100)-ES and (1+1)-ES (that is, random). Thus, we observe the comparison result of both methods. Then, we select the best way. Second, the RNN part is composed of 7-nodes. That is, 1 input node, 2 hidden layer nodes, 4 output nodes. Selection of this structure depends on the characteristics of sequentially inputted speech data. Third, the evaluation part is very important. This part is the cause of the extraction speed and satisfaction degree of result. Then we implement a simulator from the above modules. Applied to other speech data, we observe the result of recognition.
Keywords :
emotion recognition; feature extraction; learning (artificial intelligence); recurrent neural nets; speech recognition; angry; center-clipping; emotion recognition; evaluation; feature pattern; generation of individuals; laugh; learning; normal; pitch analysis; recurrent neural network; sound data; speech; surprise; Data mining; Emotion recognition; Neural networks; Pattern recognition; Positron emission tomography; Recurrent neural networks; Robot sensing systems; Speech analysis; Speech recognition; Telephony;