Identifying Language Origin of Person Names With N-Grams of Different Units

Author

Chen, Yining ; You, Jiali ; Chu, Min ; Zhao, Yong ; Wang, JinLin

Author_Institution

Microsoft Res. Asia, Beijing

Volume

1

fYear

2006

fDate

14-19 May 2006

Abstract

Identifying the language origin of a name in English is important for generating its correct pronunciation. In this paper, N-grams of syllable-based letter clusters are proposed for the task. The performance of the N-gram model of a set of frequently used letter clusters (correspond to syllables) is compared to that of letter N-gram model in a four-language task: English, German, French, and Portuguese. On average, the letter cluster N-gram, which has 26% error rate, is slightly better than the letter N-gram, which has 27.2% error rate. Furthermore, it is found that the error distributions from the two N-grams have fairly large differences. Therefore, AdaBoost is used to combine the results from N-grams of different units. The error rate is reduced to 22.5% or a relative 17.5% error reduction is achieved after the combination

Keywords

natural languages; speech recognition; speech synthesis; English; French; German; N-gram model; Portuguese; error reduction; language origin identification; person names; speech recognition; speech synthesis; syllable-based letter clusters; Acoustics; Asia; Engines; Error analysis; HTML; Natural languages; Speech recognition; Speech synthesis; Vocabulary; Watches;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on

Conference_Location

Toulouse

ISSN

1520-6149

Print_ISBN

1-4244-0469-X

Type

conf

DOI

10.1109/ICASSP.2006.1660124

Filename

1660124