مرکز منطقه ای اطلاع رساني علوم و فناوري - Wordless Sounds: Robust Speaker Diarization Using Privacy-Preserving Audio Representations

DocumentCode :

35819

Title :

Wordless Sounds: Robust Speaker Diarization Using Privacy-Preserving Audio Representations

Author :

Parthasarathi, Sree Hari Krishnan ; Bourlard, Hervé ; Gatica-Perez, Daniel

Author_Institution :

Int. Comput. Sci. Inst., Berkeley, CA, USA

Volume :

Issue :

fYear :

2013

fDate :

Jan. 2013

Firstpage :

Lastpage :

Abstract :

This paper investigates robust privacy-sensitive audio features for speaker diarization in multiparty conversations: i.e., a set of audio features having low linguistic information for speaker diarization in a single and multiple distant microphone scenarios. We systematically investigate Linear Prediction (LP) residual. Issues such as prediction order and choice of representation of LP residual are studied. Additionally, we explore the combination of LP residual with subband information from 2.5 kHz to 3.5 kHz and spectral slope. Next, we propose a supervised framework using deep neural architecture for deriving privacy-sensitive audio features. We benchmark these approaches against the traditional Mel Frequency Cepstral Coefficients (MFCC) features for speaker diarization in both the microphone scenarios. Experiments on the RT07 evaluation dataset show that the proposed approaches yield diarization performance close to the MFCC features on the single distant microphone dataset. To objectively evaluate the notion of privacy in terms of linguistic information, we perform human and automatic speech recognition tests, showing that the proposed approaches to privacy-sensitive audio features yield much lower recognition accuracies compared to MFCC features.

Keywords :

data privacy; neural nets; speaker recognition; MFCC feature; audio features; automatic speech recognition test; deep neural architecture; frequency 2.5 kHz to 3.5 kHz; linear prediction; linguistic information; mel frequency cepstral coefficient; multiparty conversation; multiple distant microphone; privacy-preserving audio representation; robust privacy-sensitive audio feature; robust speaker diarization; supervised framework; wordless sounds; Feature extraction; Mel frequency cepstral coefficient; Neural networks; Pragmatics; Privacy; Speech; Speech processing; LP residual; Privacy sensitive audio features; deep neural networks; listening tests; speaker diarization;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2012.2215588

Filename :

6287559

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=35819