مرکز منطقه ای اطلاع رساني علوم و فناوري - Automatic speech to text transformation of spontaneous job interviews on the HuComTech database

DocumentCode :

550006

Title :

Automatic speech to text transformation of spontaneous job interviews on the HuComTech database

Author :

Szaszák, György ; Tündik, Ákos Máté ; Vicsi, Klára

Author_Institution :

Dept. of Telecommun. & Media Inf., Budapest Univ. of Technol. & Econ., Budapest, Hungary

fYear :

2011

fDate :

7-9 July 2011

Firstpage :

Lastpage :

Abstract :

Automatic recognition of spontaneous speech speech to text transformation is one of the most challenging tasks today. Whilst the recognition of read or formal speech (e.g. dictation) is possible for several languages by high accuracy rates allowing also commercial exploitation, the automatic recognition of spontaneous speech is a harder task, yielding at most about 50% accuracy. This is due to the characteristics of spontaneous speech, which shows an `irregular´ behaviour: insertions, abbreviations, truncations, uncompleted sentences, higher variability of both the acoustic and the linguistic features of speech. In this paper, automatic speech recognition of spontaneous and semi-spontaneous speech is evaluated and compared, using two speech databases to train the acoustic models and using statistical language models adapted for spontaneous speech and covering spontaneous and semi spontaneous job interview tasks. The involved speech databases are the HuComTech multi-modal database (training of acoustic and language models for spontaneous speech) and the Hungarian Reference Speech Database (training of non spontaneous acoustic models). Results show as expected that the recognition of spontaneous speech is less effective even in case of adapted language models and that considerable difference of speech recognition accuracy can be traced back both to the acoustic and language models, depending on the type and characteristics of the database they come from.

Keywords :

audio databases; formal languages; natural language processing; speech recognition; text analysis; HuComTech multimodal database; Hungarian reference speech database; acoustic models; automatic speech recognition; automatic speech to text transformation; formal speech; language models; linguistic features; semispontaneous job interview tasks; semispontaneous speech recognition; spontaneous speech recognition; statistical language models; Accuracy; Acoustics; Databases; Hidden Markov models; Speech; Speech recognition; Training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Cognitive Infocommunications (CogInfoCom), 2011 2nd International Conference on

Conference_Location :

Budapest

Print_ISBN :

978-1-4577-1806-9

Electronic_ISBN :

978-963-8111-78-4

Type :

conf

Filename :

5999476

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=550006