• DocumentCode
    1131595
  • Title

    Towards Link Characterization From Content: Recovering Distributions From Classifier Output

  • Author

    Grothendieck, John ; Gorin, Allen

  • Author_Institution
    Dept. of Stat., Rutgers Univ., Pis-cataway, NJ
  • Volume
    16
  • Issue
    4
  • fYear
    2008
  • fDate
    5/1/2008 12:00:00 AM
  • Firstpage
    847
  • Lastpage
    858
  • Abstract
    In processing large volumes of speech and language data, we are often interested in the distribution of languages, speakers, topics, etc. For large data sets, these distributions are typically estimated at a given point in time using pattern classification technology. It is well known that such estimates can be highly biased, especially for rare classes. While these biases have been addressed in some applications, they have thus far been ignored in the speech and language literature. This neglect causes significant error for low-frequency classes. Correcting this biased distribution involves exploiting uncertain knowledge of the classifier error patterns. We describe a numerical method, the Metropolis-Hastings (M-H) algorithm, which provides a Bayes estimator for the distribution. We experimentally evaluate this algorithm for a speaker recognition task, demonstrating a fivefold reduction in root mean squared error.
  • Keywords
    Bayes methods; speech processing; Bayes estimator; Metropolis-Hastings algorithm; language data processing; speech processing; Diseases; Error correction; Hoses; Humans; Natural languages; Pattern classification; Speaker recognition; Speech processing; Streaming media; Testing; Knowledge acquisition; Monte Carlo methods; speech processing;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2008.920060
  • Filename
    4489998