Title :
Sparse lexical representation for semantic entity resolution
Author :
Yuzhe Jin ; Kuansan Wang ; Kiciman, Emre
Author_Institution :
Microsoft Res., Redmond, WA, USA
Abstract :
This paper addresses the problem of semantic entity resolution (SER), which aims to determine whether some or none of the entities in a knowledge base is mentioned in a given web document. The lexical features, e.g., words and phrases, which are critical to the resolution of the semantic entities are typically of a small amount compared to all lexical features in the web document, and therefore can be modeled as sparse signals. Two techniques leveraging the principles of sparse signal recovery are proposed to identify the sparse, salient lexical features: one technique, based on the Lasso algorithm with the l2-norm distance metric, attempts to recover all the salient lexical features at once; the other technique, namely Posterior Probability Pursuit (PPP), sequentially identifies salient features one after one using the negative log posterior probability as the distance metric. Using a knowledge base consisting of about 100 million entities, we show that the proposed techniques exploiting the sparsity nature underlying SER deliver substantial performance improvement over baseline methods without sparsity consideration, demonstrating the potentials of sparse signal techniques in entity-centric web information processing.
Keywords :
document handling; knowledge based systems; probability; semantic Web; Lasso algorithm; PPP; Web document; entity centric Web information processing; knowledge base; negative log posterior probability; posterior probability pursuit; salient lexical features; semantic entity resolution; sparse lexical representation; sparse signal recovery; Encyclopedias; Knowledge based systems; Measurement; Semantics; Signal resolution; Vectors; Lasso; Sparse signal recovery; posterior probability pursuit; semantic entity resolution;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639339