DocumentCode :
3744817
Title :
Sparse non-negative matrix language modeling for geo-annotated query session data
Author :
Ciprian Chelba;Noam Shazeer
Author_Institution :
Google, Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA
fYear :
2015
Firstpage :
8
Lastpage :
14
Abstract :
The paper investigates the impact on query language modeling when using skip-grams within query as well as across queries in a given search session, in conjunction with the geo-annotation available for the query stream data. As modeling tool we use the recently proposed sparse non-negative matrix estimation technique, since it offers the same expressive power as the well-established maximum entropy approach in combining arbitrary context features. Experiments on the google.com query stream show that using session-level and geo-location context we can expect reductions in perplexity of 34% relative over the Kneser-Ney N-gram baseline; when evaluating on the ´"local" subset of the query stream, the relative reduction in PPL is 51% - more than a bit. Both sources of context information (geo-location, and previous queries in session) are about equally valuable in building a language model for the query stream.
Keywords :
"Training","Context","Data models","Predictive models","Feature extraction","Sparse matrices","Context modeling"
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
Type :
conf
DOI :
10.1109/ASRU.2015.7404767
Filename :
7404767
Link To Document :
بازگشت