Title : 
Identifying Language Origin of Person Names With N-Grams of Different Units
         
        
            Author : 
Chen, Yining ; You, Jiali ; Chu, Min ; Zhao, Yong ; Wang, JinLin
         
        
            Author_Institution : 
Microsoft Res. Asia, Beijing
         
        
        
        
        
            Abstract : 
Identifying the language origin of a name in English is important for generating its correct pronunciation. In this paper, N-grams of syllable-based letter clusters are proposed for the task. The performance of the N-gram model of a set of frequently used letter clusters (correspond to syllables) is compared to that of letter N-gram model in a four-language task: English, German, French, and Portuguese. On average, the letter cluster N-gram, which has 26% error rate, is slightly better than the letter N-gram, which has 27.2% error rate. Furthermore, it is found that the error distributions from the two N-grams have fairly large differences. Therefore, AdaBoost is used to combine the results from N-grams of different units. The error rate is reduced to 22.5% or a relative 17.5% error reduction is achieved after the combination
         
        
            Keywords : 
natural languages; speech recognition; speech synthesis; English; French; German; N-gram model; Portuguese; error reduction; language origin identification; person names; speech recognition; speech synthesis; syllable-based letter clusters; Acoustics; Asia; Engines; Error analysis; HTML; Natural languages; Speech recognition; Speech synthesis; Vocabulary; Watches;
         
        
        
        
            Conference_Titel : 
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
         
        
            Conference_Location : 
Toulouse
         
        
        
            Print_ISBN : 
1-4244-0469-X
         
        
        
            DOI : 
10.1109/ICASSP.2006.1660124