DocumentCode :
2953819
Title :
A local Latent Semantic Analysis-based kernel for document similarities
Author :
Aseervatham, Sujeevan
Author_Institution :
LIPN, Univ. Paris, Villetaneuse
fYear :
2008
fDate :
1-8 June 2008
Firstpage :
214
Lastpage :
219
Abstract :
The document similarity measure is a key point in textual data processing. It is the main responsible of the performance of a processing system. Since a decade, kernels are used as similarity functions within inner-product based algorithms such as the SVM for NLP problems and especially for text categorization. In this paper, we present a semantic space constructed from latent concepts. The concepts are extracted using the Latent Semantic Analysis (LSA). To take into account of the specificity of each document category, we use the local LSA to define the global semantic space. Furthermore, we propose a weighted semantic kernel for the global space. The experimental results of the kernel, on text categorization tasks, show that this kernel performs better than global LSA kernels and especially for small LSA dimensions.
Keywords :
computational linguistics; data analysis; natural language processing; text analysis; document category; document similarity measure; global space; inner-product based algorithm; latent concepts; local latent semantic analysis; natural language processing problem; support vector machine; text categorization; textual data processing; weighted semantic kernel; Data processing; Frequency; Indexing; Information retrieval; Kernel; Matrix decomposition; Support vector machine classification; Support vector machines; Text analysis; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on
Conference_Location :
Hong Kong
ISSN :
1098-7576
Print_ISBN :
978-1-4244-1820-6
Electronic_ISBN :
1098-7576
Type :
conf
DOI :
10.1109/IJCNN.2008.4633792
Filename :
4633792
Link To Document :
بازگشت