مرکز منطقه ای اطلاع رساني علوم و فناوري - بازشناسي تمايز خيشومي-انسدادي در شرايط نامطلوب شنيداري

شماره ركورد :

1017824

عنوان مقاله :

بازشناسي تمايز خيشومي-انسدادي در شرايط نامطلوب شنيداري

عنوان به زبان ديگر :

The recognition of nasal-stop distinction in adverse listening conditions

پديد آورندگان :

محمودزاده، زهرا پژوهشگاه علوم و فناوري اطلاعات ايران (ايران داك) - گروه زبان شناسي

تعداد صفحه :

از صفحه :

تا صفحه :

كليدواژه :

شرايط نامطلوب شنيداري , ارقام فارسي , آزمايش دركي , تمايز خيشومي- انسدادي , گذر واكه‌اي

چكيده فارسي :

براي بهبودِ بازشناسيِ خودكارِ تمايزهاي واجي، مي‌توان از سرنخ‌هاي دركي كه شنوندگان براي بازشناسي طبيعي، آنها را در موقعيت‌هاي نامطلوب شنيداري مانند گفتار تلفني يا نوفة محيط به كار مي‌برند، استفاده نمود. در اين پژوهش، براي يافتن سرنخ‌هاي دركيِ مؤثر در بازشناسيِ طبيعيِ تمايزِ خيشومي-انسدادي در جفت‌‌رقم‌هاي «دو-نه» [do]-[noh] در شرايطِ نامطلوبِ گفتارِ تلفني، از آزمون‌‌هاي دركي استفاده شد. بررسي سيگنال آكوستيكيِ [no] نشان مي‌دهد كه سرنخ‌هاي گذر واكه‌اي و زمزمة خيشومي، تحتِ تأثير عوامل مختلفِ اختلالِ سيگنال قرار گرفته، از جنبة آكوستيكي كاهش پيدا مي‌كنند و سببِ ابهام در درك خيشومي [n] مي‌شوند. در شرايط مطلوبِ شنيداري، دقت بازشناسي طبيعي [n] بر اساس تنها پارامتر زمزمة خيشومي، درحدود 40 درصد است. اما با افزودن 10 ميلي‌ثانيه از ابتداي گذرهاي واكه‌اي به آن، دقت بازشناسي به 96 درصد افزايش مييابد. در گفتار تلفني، دقت بازشناسيِ طبيعي براساس زمزمة خيشومي 29 درصد و براساس هر دو پارامتر، فقط 48 درصد است. به نظر مي‌رسد عدمِ قطعيت و ابهامِ واژگاني شنونده، به دليل حذف يا كاهش اطلاعات آوايي در شرايط نامطلوب شنيداري از يك سو و گرايش دركي شنونده به سمت همخوان بي‌نشان [d] از سوي ديگر منجر به كاهش بازشناسي خيشومي [n] شدهاست. براساس يافته هاي پژوهش، پارامتر آكوستيكي زمزمة خيشومي، نقش مؤثري در بازشناسيِ طبيعيِ خيشومي در شرايط مطلوب يا نامطلوبِ شنيداري ندارد و فقط با افزودن اطلاعاتِ گذر‌هاي سازه‌اي است كه بازشناسي به طور معناداري افزايش مييابد. بنابراين، براي بازشناسي خودكار اين واژه ها، لازم است اطلاعات زماني و طيفي واكه‌هاي مجاور و گذرهاي آنها به كار گرفته شود.

چكيده لاتين :

The automatic recognition of Persian numerals [sefr-se] “zero–three”, [do-noh] “two-nine” and [haft-haʃt] “seven-eight” is considered as a challenge for speech recognition systems. Mahmoodzadeh and Bahrani (2014) found that the acoustic reduction of telephone speech triggers lexical ambiguity for the automatic recognition of [sefr-se] pair. The numeral [sefr] is produced with the deletion of [r] at word final position and the weak labial friction of [f] is masked by the channel noise, which results in an increase of acoustic similarity between [sefr] and [se]. The automated recognition of phonological distinctions can be improved by using perceptual cues which listeners apply for the natural recognition of sounds in adverse listening conditions such as telephone speech or noisy environment. In this research for discovering efficient perceptual cues responsible for distinction of numerals [do]-[noh] “two-nine” in both natural and telephone speech, perception tests were used. The acoustic signal of [noh] shows a weak or practically deleted final [h], which is not audible and recognizable from background noise. Therefore, the acoustic differences of nasal-stop distinction and the co-articulatory effects of nasal on the following vowel play an important role in [do-no] recognition. Results show that the acoustic landmarks, nasal murmur and nasal-vowel transitions affected by various sources of signal disruption, undergo phonetic reduction, which leads to perceptual similarity of [do-noh] pair and ultimately listeners` lexical ambiguity. In optimal listening condition, natural recognition of [n] based on nasal murmur is about 40% which increases to 96% after the addition of 10ms of the beginning of vowel [o]. However in telephone speech, natural recognition of [n] based on murmur is about 29% and after the addition of 10ms of vowel [o] transitions rises to just about 48%. According to the outcomes, nasal murmur is not an influential perceptual cue for the recognition of [n] in both optimal and adverse listening conditions; however recognition rises significantly after adding only 10 ms of the beginning of following vowel. It is likely that listeners’ lexical confusion and uncertainty due to lack and reduction of phonetic information reinforced by listeners’ perceptual bias toward unmarked consonant, give rise to [d] responses and failure of [n] identification in adverse listening conditions. The automatic recognition of these words should be done by capturing further temporal and spectral information from neighboring vowel transitions.

سال انتشار :

1397

عنوان نشريه :

زبان پژوهي

فايل PDF :

7500219

عنوان نشريه :

زبان پژوهي

لينک به اين مدرک :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=8&DC=1017824