مرکز منطقه ای اطلاع رساني علوم و فناوري - Generating emphasis from neutral speech using hierarchical perturbation model by decision tree and support vector machine

DocumentCode :

2449935

Title :

Generating emphasis from neutral speech using hierarchical perturbation model by decision tree and support vector machine

Author :

Meng, Fanbo ; Wu, Zhiyong ; Meng, Helen ; Jia, Jia ; Cai, Lianhong

Author_Institution :

Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China

fYear :

2012

fDate :

16-18 July 2012

Firstpage :

442

Lastpage :

448

Abstract :

In a computer-aided pronunciation training (CAPT) system, corrective feedback is desired to provide contrastive comparisons between user´s and canonical pronunciations. This paper presents a hierarchical perturbation model to generate emphasis for English by modifying acoustic features of neutral speech to highlight such important speech segments. Synthesis of emphasis needs to be realized hierarchically at word, syllable and phone layers. A two-pass decision tree is constructed to cluster acoustic variations between emphatic and neutral speeches. The questions for decision tree construction are designed according to the above layers. The questions related to word and syllable layers are used to construct the main tree and then the questions related to phone layer are used to expand the leaves of main tree (deriving a set of subtrees). Support vector machines (SVMs) are used to predict acoustic variations for all the leaves of main tree (at word and syllable layers) and sub-trees (at phone layer). Experiments indicate that the proposed hierarchical perturbation model can generate emphatic speech with high quality for both naturalness and emphasis.

Keywords :

acoustic signal processing; computer based training; decision trees; linguistics; natural languages; speech processing; speech synthesis; support vector machines; word processing; CAPT system; English; SVM; acoustic feature modification; acoustic variations cluster; canonical pronunciations; computer-aided pronunciation training system; corrective feedback; emphatic speeches; hierarchical perturbation model; neutral speech; phone layer; speech segments; subtrees; support vector machine; syllable layer; two-pass decision tree; user pronunciations; word layer; Acoustics; Decision trees; Feature extraction; Reactive power; Speech; Support vector machines; Training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Audio, Language and Image Processing (ICALIP), 2012 International Conference on

Conference_Location :

Shanghai

Print_ISBN :

978-1-4673-0173-2

Type :

conf

DOI :

10.1109/ICALIP.2012.6376658

Filename :

6376658

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2449935