Title : 
Spoken Proper Name Retrieval in Audio Streams for Limited-Resource Languages Via Lattice Based Search Using Hybrid Representations
         
        
            Author : 
Akbacak, Murat ; Hansen, John H L
         
        
            Author_Institution : 
Center for Robust Speech Syst., Texas Univ., Dallas, TX
         
        
        
        
        
            Abstract : 
Research in multilingual speech recognition has shown that current speech recognition technology generalizes across different languages, and that similar modeling assumptions hold, provided that linguistic knowledge (e.g., phoneme inventory, pronunciation dictionary, etc.) and transcribed speech data are available for the target language. Linguists make a very conservative estimate that 4000 languages are spoken today in the world, and in many of these languages, very limited linguistic knowledge and speech data/resources are available. Rapid transition to a new target language becomes a practical concern within the concept of tiered resources. In this study, we present our research efforts towards multilingual spoken information retrieval with limitations in acoustic training data. We propose different retrieval algorithms to leverage existing resources from resource-rich languages as well as the target language using a lattice-based search. We use Latin-American Spanish as the target language. After searching for queries consisting of Spanish proper names in Spanish Broadcast News data, we obtain performance (max-F value of 28.3%) close to that of a Spanish based system (trained on speech data from 36 speakers) using only 25% of all the available speech data from the original target language
         
        
            Keywords : 
information retrieval; natural languages; speech recognition; Latin-American Spanish; acoustic training data; audio streams; lattice based search; limited-resource languages; multilingual speech recognition; resource-rich languages; spoken proper name retrieval; transcribed speech data; Broadcasting; Dictionaries; Information retrieval; Lattices; Natural languages; Robustness; Speech recognition; Streaming media; Target recognition; Training data;
         
        
        
        
            Conference_Titel : 
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
         
        
            Conference_Location : 
Toulouse
         
        
        
            Print_ISBN : 
1-4244-0469-X
         
        
        
            DOI : 
10.1109/ICASSP.2006.1660180