Title : 
An efficient method of language identification using LVQ network
         
        
            Author : 
Xiao, Han ; Yu, Lei ; Chen, Kai
         
        
            Author_Institution : 
Sch. of Inf. Eng., Beijing Univ. of Posts & Telecommun., Beijing
         
        
        
        
        
            Abstract : 
This paper presents a new method to identify languages. A LVQ (learning vector quantization) network aimed at language identification is introduced. The presence of particular characters, words and the statistical information of word lengths are used as a feature vector. The new classification technique is faster than the conventional N-gram based classification approach, but it performs similarly in correct classification rate. In an identification experiment with 8 Roman alphabet languages, the LVQ network achieved 97.6% correct classification rate with 500 bytes, but it is five times faster than N-gram based approach.
         
        
            Keywords : 
classification; feature extraction; learning (artificial intelligence); natural languages; text analysis; vector quantisation; Roman alphabet languages; feature extraction; feature vector; language identification; learning vector quantization; word lengths; Books; Data mining; Feature extraction; Frequency; Natural languages; Organizing; Statistical distributions; Statistics; Vector quantization; Web and internet services;
         
        
        
        
            Conference_Titel : 
Signal Processing, 2008. ICSP 2008. 9th International Conference on
         
        
            Conference_Location : 
Beijing
         
        
            Print_ISBN : 
978-1-4244-2178-7
         
        
            Electronic_ISBN : 
978-1-4244-2179-4
         
        
        
            DOI : 
10.1109/ICOSP.2008.4697462