Title : 
Which performs better for new word detection, character based or Chinese Word Segmentation based?
         
        
            Author : 
Haijun Zhang ; Shumin Shi
         
        
            Author_Institution : 
Sch. of Comput. Sci. & Technol., Xinjiang Normal Univ., Urumqi, China
         
        
        
        
        
        
            Abstract : 
This paper proposed a novel method to evaluate the performance of New Word Detection (NWD) based on repeats extraction. For small-scale corpus, we put forward employing Conditional Random Field (CRF) as statistical framework to estimate the effects of different strategies of NWD. For the situations of large-scale corpus, as there is no infinity of annotated corpus, comparative experiments are unable to carry out evaluation. Accordingly, this paper proposed a pragmatic quantitative model to analyze and estimate the performance of NWD for all kinds of cases, especially for large-scale corpus situation. Studies have shown there is a good mutual authentication between experimental results and conclusion from the quantitative model. On the basis of analysis for experimental data and quantitative model, a reliable conclusion for effects of Chinese NWD basing the two strategies is reached, which can give a certain instruction for follow-up studies in Chinese new word detection.
         
        
            Keywords : 
natural language processing; random processes; statistical analysis; CRF; Chinese NWD; Chinese new word detection; Chinese word segmentation based; annotated corpus; character based; conditional random field; large-scale corpus situation; mutual authentication; performance estimation; pragmatic quantitative model; repeats extraction; statistical framework; Analytical models; Data models; Dictionaries; Educational institutions; Feature extraction; Pragmatics; Support vector machines; CRF; Character Based; Chinese Word Segmentation; New Words Detection; Repeats;
         
        
        
        
            Conference_Titel : 
Asian Language Processing (IALP), 2014 International Conference on
         
        
            Conference_Location : 
Kuching
         
        
        
            DOI : 
10.1109/IALP.2014.6973474