• DocumentCode
    3426238
  • Title

    Towards link characterization from content

  • Author

    Grothendieck, John ; Gorin, Allen

  • Author_Institution
    Rutgers Univ., Newark, NJ
  • fYear
    2008
  • fDate
    March 31 2008-April 4 2008
  • Firstpage
    4849
  • Lastpage
    4852
  • Abstract
    In processing large volumes of speech and language data, we are often interested in the distribution of languages, speakers, topics, etc. For large data sets, these distributions are typically estimated at a given point in time using pattern classification technology. Such estimates can be highly biased, especially for rare classes. While these biases have been addressed in some applications, they have thus far been ignored in the speech and language literature. This neglect causes significant error for low-frequency classes. Correcting this biased distribution involves exploiting uncertain knowledge of the classifier error patterns. The Metropolis-Hastings algorithm allows us to construct a Bayes estimator for the true class proportions. We experimentally evaluate this algorithm for a speaker recognition task. In this experiment, the Bayes estimator reduces maximum RMSE by a factor of five. Performance is furthermore more consistent, with range of RMSE reduced by a factor of 4.
  • Keywords
    Bayes methods; speaker recognition; Bayes estimator; Metropolis-Hastings algorithm; language processing; link characterization; pattern classification technology; speaker recognition task; speech processing; Bayesian methods; Error analysis; Humans; Internet; Natural languages; Speech analysis; Speech processing; Statistical distributions; Telecommunication traffic; Uncertainty; Monte Carlo methods; knowledge acquisition; speech processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-1483-3
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2008.4518743
  • Filename
    4518743