مرکز منطقه ای اطلاع رساني علوم و فناوري - Semi-Supervised Sequence Labeling with Self-Learned Features

DocumentCode :

2771595

Title :

Semi-Supervised Sequence Labeling with Self-Learned Features

Author :

Qi, Yanjun ; Kuksa, Pavel ; Collobert, Ronan ; Sadamasa, Kunihiko ; Kavukcuoglu, Koray ; Weston, Jason

Author_Institution :

Machine Learning Dept., NEC Labs. America Inc., Princeton, NJ, USA

fYear :

2009

fDate :

6-9 Dec. 2009

Firstpage :

428

Lastpage :

437

Abstract :

Typical information extraction (IE) systems can be seen as tasks assigning labels to words in a natural language sequence. The performance is restricted by the availability of labeled words. To tackle this issue, we propose a semi-supervised approach to improve the sequence labeling procedure in IE through a class of algorithms with self-learned features (SLF). A supervised classifier can be trained with annotated text sequences and used to classify each word in a large set of unannotated sentences. By averaging predicted labels over all cases in the unlabeled corpus, SLF training builds class label distribution patterns for each word (or word attribute) in the dictionary and re-trains the current model iteratively adding these distributions as extra word features. Basic SLF models how likely a word could be assigned to target class types. Several extensions are proposed, such as learning words´ class boundary distributions. SLF exhibits robust and scalable behaviour and is easy to tune. We applied this approach on four classical IE tasks: named entity recognition (German and English), part-of-speech tagging (English) and one gene name recognition corpus. Experimental results show effective improvements over the supervised baselines on all tasks. In addition, when compared with the closely related self-training idea, this approach shows favorable advantages.

Keywords :

learning (artificial intelligence); natural language processing; annotated text sequences; gene name recognition corpus; information extraction systems; named entity recognition; natural language sequence; part-of-speech tagging; self-learned features; semisupervised sequence labeling; supervised classifier; tasks assigning labels; Computer science; Data mining; Labeling; Machine learning; National electric code; Natural language processing; Neural networks; Predictive models; Tagging; USA Councils; information extraction; self-learned features; semi-supervised feature learning; semi-supervised learning; sequence labeling; structural output learning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on

Conference_Location :

Miami, FL

ISSN :

1550-4786

Print_ISBN :

978-1-4244-5242-2

Electronic_ISBN :

1550-4786

Type :

conf

DOI :

10.1109/ICDM.2009.40

Filename :

5360268

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2771595