Title :
Automatic Extraction of Spoken Word in Broadcast Media Language
Author :
Zhang, Yuqiang ; Zou, Yu ; He, Wei ; Hou, Min ; Teng, Yonglin
Author_Institution :
Broadcast Media Language Res. Center, Commun. Univ. of China, Beijing, China
Abstract :
Compared with the written word, few experts pay more attention to the spoken word because of the difficulty of obtaining spoken corpora. In order to develop and improve the spoken words research, this paper proposes a novel method for automatic extraction spoken words in broadcasting language, and the result is impressive. From analysis of the result, we extracted 3009 spoken words by the model on word usage frequency of spatial distribution, and obtain a correct extraction rate over 85% in part I data and 76.5% in part II respectively. The word usage frequency of spatial distribution model can effectively extract and distinguish the spoken words from broadcast media language.
Keywords :
information retrieval; speech processing; word processing; automatic spoken word extraction; broadcast media language; spatial distribution model; spoken corpora; word usage frequency; Data mining; Frequency; Helium; Large-scale systems; Logistics; Natural languages; Radio broadcasting; Speech; TV broadcasting;
Conference_Titel :
Computational Sciences and Optimization, 2009. CSO 2009. International Joint Conference on
Conference_Location :
Sanya, Hainan
Print_ISBN :
978-0-7695-3605-7
DOI :
10.1109/CSO.2009.82