DocumentCode :
3144517
Title :
Encoding navigable speech sources: An analysis by synthesis approach
Author :
Zheng, Xiguang ; Ritz, Christian ; Xi, Jiangtao
Author_Institution :
ICT Res. Inst., Univ. of Wollongong, Wollongong, NSW, Australia
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
405
Lastpage :
408
Abstract :
This paper pressents an analysis-by-synthesis coding architecture for compressing navigable speech sources. The proposed coding scheme encodes multiple overlapped speech sources recorded, for example, during a multi-participant meeting or teleconference, into a mono or stereo mixture signal that can be compressed with an existing speech coder. The individual speech sources can be separated from the received compressed mixture, which allows the listener to determine the active sources and their spatial locations at the reproduction site. The approach was applied to the compression of a series of speech soundfields created from multiple clean speech sentences and real meeting recordings, where each sound-field contained four participants with up to three simultaneous speech sources. At a total bit rate of 48 kbps, the perceptual quality of each decoded speech source, as judged by subjective listening tests, was found to be significantly better than either a non-a-by-s approach or separate encoding of each source at the same overall total bit rate. Subjective listening tests also confirm that the quality of the spatialised speech scene is maintained as well.
Keywords :
speech coding; speech synthesis; teleconferencing; analysis-by-synthesis coding architecture; compressing navigable speech sources; meeting recordings; mono mixture signal; multiparticipant meeting; multiple clean speech sentences; navigable speech source encoding; nona-by-s approach; overlapped speech sources; received compressed mixture; speech coder; speech soundfields; speech source decoding; stereo mixture signal; synthesis approach; teleconference; Azimuth; Navigation; Speech; Speech coding; Time domain analysis; Time frequency analysis; Multichannel Speech Coding; Soundfield Navigation; Spatial Teleconferencing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6287902
Filename :
6287902
Link To Document :
بازگشت