DocumentCode :
550006
Title :
Automatic speech to text transformation of spontaneous job interviews on the HuComTech database
Author :
Szaszák, György ; Tündik, Ákos Máté ; Vicsi, Klára
Author_Institution :
Dept. of Telecommun. & Media Inf., Budapest Univ. of Technol. & Econ., Budapest, Hungary
fYear :
2011
fDate :
7-9 July 2011
Firstpage :
1
Lastpage :
4
Abstract :
Automatic recognition of spontaneous speech speech to text transformation is one of the most challenging tasks today. Whilst the recognition of read or formal speech (e.g. dictation) is possible for several languages by high accuracy rates allowing also commercial exploitation, the automatic recognition of spontaneous speech is a harder task, yielding at most about 50% accuracy. This is due to the characteristics of spontaneous speech, which shows an `irregular´ behaviour: insertions, abbreviations, truncations, uncompleted sentences, higher variability of both the acoustic and the linguistic features of speech. In this paper, automatic speech recognition of spontaneous and semi-spontaneous speech is evaluated and compared, using two speech databases to train the acoustic models and using statistical language models adapted for spontaneous speech and covering spontaneous and semi spontaneous job interview tasks. The involved speech databases are the HuComTech multi-modal database (training of acoustic and language models for spontaneous speech) and the Hungarian Reference Speech Database (training of non spontaneous acoustic models). Results show as expected that the recognition of spontaneous speech is less effective even in case of adapted language models and that considerable difference of speech recognition accuracy can be traced back both to the acoustic and language models, depending on the type and characteristics of the database they come from.
Keywords :
audio databases; formal languages; natural language processing; speech recognition; text analysis; HuComTech multimodal database; Hungarian reference speech database; acoustic models; automatic speech recognition; automatic speech to text transformation; formal speech; language models; linguistic features; semispontaneous job interview tasks; semispontaneous speech recognition; spontaneous speech recognition; statistical language models; Accuracy; Acoustics; Databases; Hidden Markov models; Speech; Speech recognition; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cognitive Infocommunications (CogInfoCom), 2011 2nd International Conference on
Conference_Location :
Budapest
Print_ISBN :
978-1-4577-1806-9
Electronic_ISBN :
978-963-8111-78-4
Type :
conf
Filename :
5999476
Link To Document :
بازگشت