مرکز منطقه ای اطلاع رساني علوم و فناوري - Spoken word recognition system for unlimited adult male speakers

Abstract :

An online automatic spoken word recognition system has been developed for the researches on the automatic recognition of speech. In this system, the spoken word is first frequency analysed with a filter bank of single tuned low selectivity filters. Three major local peaks in the spectrum and the amplitude of the speech wave are extracted every 10 ms. The frequencies of two local peaks are used for classifying the vowels, and the frequencies of three local peaks, the movements of them and the amplitude are used for classifying the semi-vowels and consonants. Input speech is thus transformed into a sequence of the notations expressing the phonemes or phoneme groups every 10 ms. The sequence is again transformed into possible phonemic strings which are called input words henceforth and are convenient for the comparison with the contents of the dictionary. The Hamming´s distance between each input word and each item of the contents of the dictionary is computed where the notations of phonemes and phoneme groups are expressed by 9 bits binary vectors. The item in the dictionary nearest to one of the input word is selected as the output of the recognition system. The experiments were carried out with the utterance of five speakers from whose utterances the standard patterns for P1, P2 and Pe3 distribution had been made. The recognition score was 96% for the 20 city names involving all kinds of phonemes. The speech samples were increased to 166 city names and 82% of the utterances of three speakers were correctly recognized by adding the possible combination of phonemes to every word. Next, 13 different speakers uttered 51 city names having long distance between each other, the recognition score was found to be 94% when the speakers were permitted to repeat their utterances for three times.