مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

1323458

Title :

Part-of-Speech Tagging by Latent Analogy

Author :

Bellegarda, Jerome R.

Author_Institution :

Apple Inc, Speech & Language Technol., Cupertino, CA, USA

Volume :

Issue :

fYear :

2010

Firstpage :

985

Lastpage :

993

Abstract :

Part-of-speech tagging is often a critical first step in various speech and language processing tasks. High-accuracy taggers (e.g., based on conditional random fields) rely on well chosen feature functions to ensure that important characteristics of the empirical training distribution are reflected in the trained model. This makes them vulnerable to any discrepancy between training and tagging corpora, and, in particular, accuracy is adversely affected by the presence of out-of-vocabulary words. This paper explores an alternative tagging strategy based on the principle of latent analogy, which was originally introduced in the context of a speech synthesis application. In this approach, locally optimal tag subsequences emerge automatically from an appropriate representation of global sentence-level information. This solution eliminates the need for feature engineering, while exploiting a broader context more conducive to word sense disambiguation. Empirical evidence suggests that, in practice, tagging by latent analogy is essentially competitive with conventional Markovian techniques, while benefiting from substantially less onerous training costs. This opens up the possibility that integration with such techniques may lead to further improvements in tagging accuracy.

Keywords :

Markov processes; natural language processing; speech synthesis; Markovian techniques; empirical training distribution; global sentence-level information representation; language processing; latent analogy; part-of-speech tagging; speech processing; speech synthesis; word sense disambiguation; Hidden Markov models; Natural language processing; Semantics; Speech recognition; Statistical learning; Tagging; Training; Latent semantic mapping (LSM); natural language processing (NLP); part-of-speech (POS) disambiguation; sequence labeling; statistical modeling;

fLanguage :

English

Journal_Title :

Selected Topics in Signal Processing, IEEE Journal of

Publisher :

ieee

ISSN :

1932-4553

Type :

jour

DOI :

10.1109/JSTSP.2010.2075970

Filename :

5570877

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1323458