مرکز منطقه ای اطلاع رساني علوم و فناوري - Scaling Conditional Random Field with Application to Chinese Word Segmentation

DocumentCode :

1599359

Title :

Scaling Conditional Random Field with Application to Chinese Word Segmentation

Author :

Zhao, Hai ; Kit, Chunyu

Author_Institution :

City Univ. of Hong Kong, Kowloon

Volume :

fYear :

2007

Firstpage :

Lastpage :

Abstract :

As a powerful sequence labeling model, conditional random field (CRF) has been applied to a number of natural language processing (NLP) tasks successfully. However, the high complexity of CRF training only allows a very small tag (or label)1 set, because the training becomes intractable as the tag set enlarges. This paper proposes an improved decomposed training and joint decoding algorithm for CRF learning. Instead of training a single CRF model for all tags, it trains a binary sub-CRF independently for each tag. A predicted tag sequence is then produced by a joint decoding algorithm based on the probabilistic output of all sub-CRFs involved. To test its effectiveness, this approach is applied to tackle Chinese word segmentation (CWS) as a character tagging problem. Our evaluation shows that it can reduce time and memory cost by 20-39% and 44-50%, respectively, without any significant performance loss on various large-scale data sets.

Keywords :

decoding; learning (artificial intelligence); natural language processing; text analysis; CRF learning; Chinese word segmentation; conditional random field scaling; joint decoding algorithm; natural language processing; predicted tag sequence; sequence labeling model; Computational complexity; Cost function; Decoding; Hidden Markov models; Labeling; Large-scale systems; Natural language processing; Tagging; Testing; Yttrium;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Natural Computation, 2007. ICNC 2007. Third International Conference on

Conference_Location :

Haikou

Print_ISBN :

978-0-7695-2875-5

Type :

conf

DOI :

10.1109/ICNC.2007.648

Filename :

4344817

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1599359