DocumentCode :
1323458
Title :
Part-of-Speech Tagging by Latent Analogy
Author :
Bellegarda, Jerome R.
Author_Institution :
Apple Inc, Speech & Language Technol., Cupertino, CA, USA
Volume :
4
Issue :
6
fYear :
2010
Firstpage :
985
Lastpage :
993
Abstract :
Part-of-speech tagging is often a critical first step in various speech and language processing tasks. High-accuracy taggers (e.g., based on conditional random fields) rely on well chosen feature functions to ensure that important characteristics of the empirical training distribution are reflected in the trained model. This makes them vulnerable to any discrepancy between training and tagging corpora, and, in particular, accuracy is adversely affected by the presence of out-of-vocabulary words. This paper explores an alternative tagging strategy based on the principle of latent analogy, which was originally introduced in the context of a speech synthesis application. In this approach, locally optimal tag subsequences emerge automatically from an appropriate representation of global sentence-level information. This solution eliminates the need for feature engineering, while exploiting a broader context more conducive to word sense disambiguation. Empirical evidence suggests that, in practice, tagging by latent analogy is essentially competitive with conventional Markovian techniques, while benefiting from substantially less onerous training costs. This opens up the possibility that integration with such techniques may lead to further improvements in tagging accuracy.
Keywords :
Markov processes; natural language processing; speech synthesis; Markovian techniques; empirical training distribution; global sentence-level information representation; language processing; latent analogy; part-of-speech tagging; speech processing; speech synthesis; word sense disambiguation; Hidden Markov models; Natural language processing; Semantics; Speech recognition; Statistical learning; Tagging; Training; Latent semantic mapping (LSM); natural language processing (NLP); part-of-speech (POS) disambiguation; sequence labeling; statistical modeling;
fLanguage :
English
Journal_Title :
Selected Topics in Signal Processing, IEEE Journal of
Publisher :
ieee
ISSN :
1932-4553
Type :
jour
DOI :
10.1109/JSTSP.2010.2075970
Filename :
5570877
Link To Document :
بازگشت