• DocumentCode
    1135178
  • Title

    Tracking Changes in Language

  • Author

    Grothendieck, John

  • Author_Institution
    AT&T Labs.-Res., USA
  • Volume
    13
  • Issue
    5
  • fYear
    2005
  • Firstpage
    700
  • Lastpage
    711
  • Abstract
    One problem that has arisen in recent years is the extraction of useful information from changes in a data stream including natural language. Statistical tests on single word occurrences can reveal many apparent differences. Understanding the reasons behind such changes in the data requires methods for discovering structure within the entire set of individual changed items. This work presents a methodology for understanding how a language model has altered based on utterance clustering and statistical tests on individual features. It further examines clustering of lexical items via profiles of changes in association scores. A machine using an analysis package based on these techniques can isolate novel portions of the data stream. Human inspection of such data then readily determines the nature of the observed change. We investigate several variants of this analysis upon data drawn from an automated call center.
  • Keywords
    data mining; natural languages; speech processing; statistical analysis; analysis package; change detection; information extraction; language model; speech data mining; statistical tests; Customer service; Data analysis; Data mining; Humans; Inspection; Natural languages; Packaging machines; Speech analysis; Telephony; Testing; Change detection; clustering; speech data mining;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/TSA.2005.852087
  • Filename
    1495451