DocumentCode
2822122
Title
Automatic Extraction of Spoken Word in Broadcast Media Language
Author
Zhang, Yuqiang ; Zou, Yu ; He, Wei ; Hou, Min ; Teng, Yonglin
Author_Institution
Broadcast Media Language Res. Center, Commun. Univ. of China, Beijing, China
Volume
2
fYear
2009
fDate
24-26 April 2009
Firstpage
403
Lastpage
405
Abstract
Compared with the written word, few experts pay more attention to the spoken word because of the difficulty of obtaining spoken corpora. In order to develop and improve the spoken words research, this paper proposes a novel method for automatic extraction spoken words in broadcasting language, and the result is impressive. From analysis of the result, we extracted 3009 spoken words by the model on word usage frequency of spatial distribution, and obtain a correct extraction rate over 85% in part I data and 76.5% in part II respectively. The word usage frequency of spatial distribution model can effectively extract and distinguish the spoken words from broadcast media language.
Keywords
information retrieval; speech processing; word processing; automatic spoken word extraction; broadcast media language; spatial distribution model; spoken corpora; word usage frequency; Data mining; Frequency; Helium; Large-scale systems; Logistics; Natural languages; Radio broadcasting; Speech; TV broadcasting;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Sciences and Optimization, 2009. CSO 2009. International Joint Conference on
Conference_Location
Sanya, Hainan
Print_ISBN
978-0-7695-3605-7
Type
conf
DOI
10.1109/CSO.2009.82
Filename
5193982
Link To Document