Title :
Smooth Nonnegative Matrix Factorization for Unsupervised Audiovisual Document Structuring
Author :
Essid, Slim ; Fevotte, Cedric
Author_Institution :
LTCI, Inst. Telecom-Telecom ParisTech, Paris, France
Abstract :
This paper introduces a new paradigm for unsupervised audiovisual document structuring. In this paradigm, a novel Nonnegative Matrix Factorization (NMF) algorithm is applied on histograms of counts (relating to a bag of features representation of the content) to jointly discover latent structuring patterns and their activations in time. Our NMF variant employs the Kullback-Leibler divergence as a cost function and imposes a temporal smoothness constraint to the activations. It is solved by a majorization-minimization technique. The approach proposed is meant to be generic and is particularly well suited to applications where the structuring patterns may overlap in time. As such, it is evaluated on two person-oriented video structuring tasks (one using the visual modality and the second the audio). This is done using a challenging database of political debate videos. Our results outperform reference results obtained by a method using Hidden Markov Models. Further, we show the potential that our general approach has for audio speaker diarization.
Keywords :
audio signal processing; document handling; hidden Markov models; matrix decomposition; minimisation; unsupervised learning; video databases; video signal processing; Kullback-Leibler divergence; NMF algorithm; audio modality; audio speaker diarization; cost function; hidden Markov model; histogram-of-count; latent structuring pattern; majorization-minimization technique; person-oriented video structuring task; political debate video database; smooth nonnegative matrix factorization; temporal smoothness constraint; unsupervised audiovisual document structuring; visual modality; Data models; Feature extraction; Histograms; Indexing; Telecommunications; Visualization; Vocabulary; Bag of features; content structuring; indexing; machine learning; matrix factorization; unsupervised classification; videos;
Journal_Title :
Multimedia, IEEE Transactions on
DOI :
10.1109/TMM.2012.2228474