• DocumentCode
    2822122
  • Title

    Automatic Extraction of Spoken Word in Broadcast Media Language

  • Author

    Zhang, Yuqiang ; Zou, Yu ; He, Wei ; Hou, Min ; Teng, Yonglin

  • Author_Institution
    Broadcast Media Language Res. Center, Commun. Univ. of China, Beijing, China
  • Volume
    2
  • fYear
    2009
  • fDate
    24-26 April 2009
  • Firstpage
    403
  • Lastpage
    405
  • Abstract
    Compared with the written word, few experts pay more attention to the spoken word because of the difficulty of obtaining spoken corpora. In order to develop and improve the spoken words research, this paper proposes a novel method for automatic extraction spoken words in broadcasting language, and the result is impressive. From analysis of the result, we extracted 3009 spoken words by the model on word usage frequency of spatial distribution, and obtain a correct extraction rate over 85% in part I data and 76.5% in part II respectively. The word usage frequency of spatial distribution model can effectively extract and distinguish the spoken words from broadcast media language.
  • Keywords
    information retrieval; speech processing; word processing; automatic spoken word extraction; broadcast media language; spatial distribution model; spoken corpora; word usage frequency; Data mining; Frequency; Helium; Large-scale systems; Logistics; Natural languages; Radio broadcasting; Speech; TV broadcasting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Sciences and Optimization, 2009. CSO 2009. International Joint Conference on
  • Conference_Location
    Sanya, Hainan
  • Print_ISBN
    978-0-7695-3605-7
  • Type

    conf

  • DOI
    10.1109/CSO.2009.82
  • Filename
    5193982