مرکز منطقه ای اطلاع رساني علوم و فناوري - Encoding navigable speech sources: An analysis by synthesis approach

DocumentCode :

3144517

Title :

Encoding navigable speech sources: An analysis by synthesis approach

Author :

Zheng, Xiguang ; Ritz, Christian ; Xi, Jiangtao

Author_Institution :

ICT Res. Inst., Univ. of Wollongong, Wollongong, NSW, Australia

fYear :

2012

fDate :

25-30 March 2012

Firstpage :

405

Lastpage :

408

Abstract :

This paper pressents an analysis-by-synthesis coding architecture for compressing navigable speech sources. The proposed coding scheme encodes multiple overlapped speech sources recorded, for example, during a multi-participant meeting or teleconference, into a mono or stereo mixture signal that can be compressed with an existing speech coder. The individual speech sources can be separated from the received compressed mixture, which allows the listener to determine the active sources and their spatial locations at the reproduction site. The approach was applied to the compression of a series of speech soundfields created from multiple clean speech sentences and real meeting recordings, where each sound-field contained four participants with up to three simultaneous speech sources. At a total bit rate of 48 kbps, the perceptual quality of each decoded speech source, as judged by subjective listening tests, was found to be significantly better than either a non-a-by-s approach or separate encoding of each source at the same overall total bit rate. Subjective listening tests also confirm that the quality of the spatialised speech scene is maintained as well.

Keywords :

speech coding; speech synthesis; teleconferencing; analysis-by-synthesis coding architecture; compressing navigable speech sources; meeting recordings; mono mixture signal; multiparticipant meeting; multiple clean speech sentences; navigable speech source encoding; nona-by-s approach; overlapped speech sources; received compressed mixture; speech coder; speech soundfields; speech source decoding; stereo mixture signal; synthesis approach; teleconference; Azimuth; Navigation; Speech; Speech coding; Time domain analysis; Time frequency analysis; Multichannel Speech Coding; Soundfield Navigation; Spatial Teleconferencing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location :

Kyoto

ISSN :

1520-6149

Print_ISBN :

978-1-4673-0045-2

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2012.6287902

Filename :

6287902

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3144517