Title :
Topic Models over Spoken Language
Author :
Pansare, Nikhil ; Jermaine, Christopher ; Haas, P. ; Rajput, Neelima
Author_Institution :
Dept. of Comput. Sci., Rice Univ., Houston, TX, USA
Abstract :
Virtually all work on topic modeling has assumed that the topics are to be learned over a text-based document corpus. However, there exist important applications where topic models must be learned over an audio corpus of spoken language. Unfortunately, speech-to-text programs can have very low accuracy. We therefore propose a novel topic model for spoken language that incorporates a statistical model of speech-to-text software behavior. Crucially, our model exploits the uncertainty numbers returned by the software. Our ideas apply to any domain in which it would be useful to build a topic model over data in which uncertainties are explicitly represented.
Keywords :
speech processing; statistical analysis; text analysis; audio corpus; speech-to-text program; speech-to-text software behavior; spoken language; statistical model; text-based document corpus; topic model; Accuracy; Biological system modeling; Computational modeling; Data models; Software; Uncertainty; Vectors; Speech recognition; Text analysis; Uncertain data;
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
Print_ISBN :
978-1-4673-4649-8
DOI :
10.1109/ICDM.2012.90