DocumentCode :
2691054
Title :
Unsupervised Training on a Large Amount of Arabic Broadcast News Data
Author :
Ma, Jiaxin ; Matsoukas, Spyros
Author_Institution :
BBN Technol., Cambridge, MA, USA
Volume :
2
fYear :
2007
fDate :
15-20 April 2007
Abstract :
The unsupervised training we carried out on the 1,858-hour untranscribed Arabic broadcast news (BN) data yields a sizable gain. However, this gain is only about half of that achieved on the 1,900-hour English BN data. This paper presents our efforts that aim at enlarging the gain on the Arabic data. These efforts include a design of an explicit hypothesis-confidence-estimating method for the data selection, use of new features and neural networks (NN) to improve hypothesis-confidence estimation, and alleviation of the over-fitting problem existing in the estimation. Our experiments show that both the explicit hypothesis-confidence-estimating method and the use of new features improve the estimation and render the unsupervised training extra gains; the use of neural networks doesn´t significantly improve the confidence estimation; the alleviation of the over-fitting problem is not significant enough to decrease the word error rate (WER). This paper also presents improvements of unsupervised training we conducted on a morpheme-based Arabic system and on models trained with maximum mutual information (MMI) criterion.
Keywords :
natural language processing; neural nets; radio broadcasting; speech recognition; unsupervised learning; English BN data; explicit hypothesis-confidence-estimation method; maximum mutual information criterion; morpheme-based Arabic system; neural networks; over-fitting problem; unsupervised training; untranscribed Arabic broadcast news data; word error rate; Broadcast technology; Broadcasting; Design methodology; Error analysis; Inspection; Mutual information; Natural languages; Neural networks; Radio access networks; Speech; Arabic broadcast news; confidence estimation; speech recognition; unsupervised training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
ISSN :
1520-6149
Print_ISBN :
1-4244-0727-3
Type :
conf
DOI :
10.1109/ICASSP.2007.366244
Filename :
4217417
Link To Document :
بازگشت