Title :
Bag-of-words representation for non-intrusive speech quality assessment
Author :
Qiaohong Li ; Weisi Lin ; Yuming Fang ; Thalmann, Daniel
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
Abstract :
Research on non-intrusive speech quality assessment (SQA) aims to develop a computational model simulating the human perception of speech signals accurately and automatically without any prior information about the reference clean speech signals. In this paper, we propose to learn a non-intrusive SQA metric based on bag-of-words (BoW) representation of speech signals. In particular, the proposed method treats the whole speech utterance as a text document and extracts perceptual linear prediction (PLP) features of local segments as words. The speech utterance is then represented as a histogram of codewords, with each entry as the probability of a codeword appeared in the utterance. After the BoW representation of speech signals is obtained, support vector regression (SVR) is used to learn the metric for quality evaluation. Experimental results demonstrate that the proposed non-intrusive SQA metric BoW can obtain better performance than relevant state-of-the-art metrics.
Keywords :
regression analysis; speech processing; support vector machines; text analysis; BoW representation; PLP feature; SQA; SVR; bag-of-words representation; computational model; human perception; nonintrusive speech quality assessment; perceptual linear prediction feature; quality evaluation; reference clean speech signal; speech utterance; support vector regression; text document; Databases; Feature extraction; Histograms; Measurement; Quality assessment; Speech; Speech coding; bag of words; codebook construction; non-intrusive quality assessment; speech quality; support vector regression;
Conference_Titel :
Signal and Information Processing (ChinaSIP), 2015 IEEE China Summit and International Conference on
Conference_Location :
Chengdu
DOI :
10.1109/ChinaSIP.2015.7230477