Title :
Latent semantic mapping: dimensionality reduction via globally optimal continuous parameter modeling
Author :
Bellegarda, Jerome R.
Author_Institution :
Speech & Language Technol., Apple Comput., Inc., Cupertino, CA
Abstract :
Originally formulated in the context of information retrieval, latent semantic analysis exhibits three main characteristics: (i) discrete entities (namely words and documents) are mapped onto a continuous vector space; (ii) this mapping is determined by global correlation patterns; and (iii) dimensionality reduction is an integral part of the process. Such fairly generic properties may be advantageous in a variety of different contexts, which motivates a broader interpretation of the underlying paradigm. The outcome is latent semantic mapping, a data-driven framework for modeling global relationships implicit in large volumes of (not necessarily textual) data. This paper gives a general overview of the framework, and underscores the multi-faceted benefits it can bring to a number of problems in natural language understanding and spoken language processing. It concludes with a discussion of the inherent trade-offs associated with the approach, and some perspectives on its general applicability to unsupervised information extraction
Keywords :
correlation methods; natural languages; continuous vector space; dimensionality reduction; global correlation patterns; globally optimal continuous; information retrieval; latent semantic mapping; natural language understanding; spoken language processing; Content based retrieval; Data mining; Functional analysis; Information analysis; Information retrieval; Natural languages; Pattern analysis; Space technology; Speech; Vocabulary;
Conference_Titel :
Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on
Conference_Location :
San Juan
Print_ISBN :
0-7803-9478-X
Electronic_ISBN :
0-7803-9479-8
DOI :
10.1109/ASRU.2005.1566490