DocumentCode :
1695970
Title :
Statistical machine translation based text normalization with crowdsourcing
Author :
Schlippe, Tim ; Chenfei Zhu ; Lemcke, Daniel ; Schultz, Tanja
Author_Institution :
Cognitive Syst. Lab., Karlsruhe Inst. of Technol. (KIT), Karlsruhe, Germany
fYear :
2013
Firstpage :
8406
Lastpage :
8410
Abstract :
In [1], we have proposed systems for text normalization based on statistical machine translation (SMT) methods which are constructed with the support of Internet users and evaluated those with French texts. Internet users normalize text displayed in a web interface in an annotation process, thereby providing a parallel corpus of normalized and non-normalized text. With this corpus, SMT models are generated to translate non-normalized into normalized text. In this paper, we analyze their efficiency for other languages. Additionally, we embedded the English annotation process for training data in Amazon Mechanical Turk and compare the quality of texts thoroughly annotated in our lab to those annotated by the Turkers. Finally, we investigate how to reduce the user effort by iteratively applying an SMT system to the next sentences to be edited, built from the sentences which have been annotated so far.
Keywords :
language translation; natural languages; statistical analysis; text analysis; Amazon Mechanical Turk; English annotation process; French texts; Internet users support; crowdsourcing; nonnormalized text; normalized text; parallel corpus; statistical machine translation; text normalization; training data; Computational modeling; Conferences; Internet; Noise measurement; Speech; Training; Training data; crowdsourcing; rapid language adaptation; statistical machine translation; text normalization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639305
Filename :
6639305
Link To Document :
بازگشت