• DocumentCode
    2851224
  • Title

    Audio Indexing of Arabic broadcast news

  • Author

    Billa, J. ; Noamany, M. ; Srivastava, A. ; Liu, D. ; Stone, R. ; Xu, J. ; Makhoul, J. ; Kubala, F.

  • Author_Institution
    BBN Technologies, Cambridge MA 02138, USA
  • Volume
    1
  • fYear
    2002
  • fDate
    13-17 May 2002
  • Abstract
    This paper describes the development of the BBN Audio Indexing System for broadcast news in Arabic. Key issues addressed in this work revolve around the three major components of the audio indexing system: automatic speech recognition, speaker identification, and named entity identification. The system deals with several challenges introduced by the Arabic language, including the absence of short vowels in written text and the presence of compound words that are formed by the concatenation of certain conjunctions, prepositions, articles, and pronouns, as prefixes and suffixes to the word stem. The lack of short vowels in the transcripts prompted a novel solution that further demonstrated the power of hidden Markov models to deal with ambiguity. Another challenge was the acquisition of appropriate language modeling data, given the absence of broadcast news data for that purpose. We present performance results for all three components of the Audio Indexing System, which we believe represent the state of the art for Arabic broadcast news.
  • Keywords
    Biomedical monitoring; Electric breakdown; Error analysis; Indexing; Speech recognition; TV; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
  • Conference_Location
    Orlando, FL, USA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7402-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2002.5743640
  • Filename
    5743640