DocumentCode
65912
Title
Aligned-Parallel-Corpora Based Semi-Supervised Learning for Arabic Mention Detection
Author
Zitouni, Imed ; Benajiba, Yassine
Author_Institution
Microsoft, Redmond, WA, USA
Volume
22
Issue
2
fYear
2014
fDate
Feb. 2014
Firstpage
314
Lastpage
324
Abstract
In the last two decades, significant effort has been put into annotating linguistic resources in several languages. Despite this valiant effort, there are still many languages left that have only small amounts of such resources. The goal of this article is to present and investigate a method of propagating information (specifically mentions) from a resource-rich language such as English into a relatively less-resource language such as Arabic. We compare also this approach to its equivalent counterpart using monolingual resources. Part of the investigation is to quantify the contribution of propagating information in different conditions - based on the availability of resources in the target language. Experiments on the language pair Arabic-English show that one can achieve relatively decent performance by propagating information from a language with richer resources such as English into Arabic alone (no resources or models in the source language Arabic). Furthermore, results show that propagated features from English do help improve the Arabic system performance even when used in conjunction with all feature types built from the source language. Experiments also show that using propagated features in conjunction with lexically-derived features only (as can be obtained directly from a mention annotated corpus) brings the system performance at the one obtained in the target language by using feature derived from many linguistic resources, therefore improving the system when such resources are not available.
Keywords
learning (artificial intelligence); natural language processing; aligned parallel corpora based semisupervised learning; mention detection; monolingual resources; resource rich language; source language; Entropy; Feature extraction; IEEE transactions; Pragmatics; Semisupervised learning; Speech; Speech processing; Information extraction; cross-lingual NLP; machine learning; mention detection; natural language processing;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher
ieee
ISSN
2329-9290
Type
jour
DOI
10.1109/TASLP.2013.2287055
Filename
6646259
Link To Document