• DocumentCode
    699444
  • Title

    Automatic segmentation and labeling of continuous speech without bootstrapping

  • Author

    Nagarajan, T. ; Murthy, Hema A. ; HemaLatha, N.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Madras, Chennai, India
  • fYear
    2004
  • fDate
    6-10 Sept. 2004
  • Firstpage
    561
  • Lastpage
    564
  • Abstract
    In this paper, a novel approach is proposed for automatically segmenting and transcribing continuous speech signal without the use of manually segmented and labeled speech corpora. The continuous speech signal is first segmented into syllable-like units by considering short-term energy as a magnitude spectrum of some arbitrary signal. Similar syllable segments are then grouped together using an unsupervised and incremental clustering technique. Separate models are generated for each cluster of syllable segments. At this stage, labels are assigned for each group of syllable segments manually. The syllable models of these clusters are then used to transcribe/recognize the continuous speech signal of closed-set speakers as well open-set speakers. As a syllable recognizer, our initial results on Indian television news bulletins of the the languages Tamil and Telugu shows that the performance is 43.3% and 32.9% respectively.
  • Keywords
    natural language processing; pattern clustering; speaker recognition; unsupervised learning; Indian television news bulletins; Tamil; Telugu; arbitrary signal; automatic continuous speech labeling; automatic continuous speech segmentation; closed-set speakers; continuous speech signal recognition; continuous speech signal transcription; incremental clustering technique; magnitude spectrum; open-set speakers; short-term energy; similar syllable segments; syllable-like units; unsupervised clustering technique; Convergence; Gold; Labeling; Speech; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing Conference, 2004 12th European
  • Conference_Location
    Vienna
  • Print_ISBN
    978-320-0001-65-7
  • Type

    conf

  • Filename
    7079974