DocumentCode
2830441
Title
Semantically Rich Spaces for Document Clustering
Author
Basili, Roberto ; Marocco, Paolo ; Milizia, Daniele
Author_Institution
Dept. of Comput. Sci., Rome Univ., Rome
fYear
2008
fDate
1-5 Sept. 2008
Firstpage
43
Lastpage
47
Abstract
Dimensionality reduction techniques address a relevant problem of vector space models that is the size of involved dictionaries. Certain geometrical transformations applied over the original feature space, like the latent semantic analysis (LSA), aim at preserving and discovering semantic relations between documents within small dimensional spaces. In this paper, a linear transformation method, named locality preserving projections (LPP), is evaluated with respect to a document clustering task and results are compared with LSA. LPP is here applied directly on the original space, through an efficient C-based implementation, and different parameterizations are investigated. Experimental results suggest that LPP is an effective technique able to account for the availability of a priori knowledge within an unsupervised learning framework.
Keywords
data reduction; document handling; information retrieval; pattern clustering; unsupervised learning; dimensionality reduction technique; document clustering; geometrical transformation; information retrieval; latent semantic analysis; linear transformation method; locality preserving projection; semantic relation discovery; unsupervised learning framework; vector space model problem; Databases; Dictionaries; Expert systems; Functional analysis; Independent component analysis; Information retrieval; Large-scale systems; Linear discriminant analysis; Solid modeling; Vectors; Document clustering; Latent Semantic Analysis; Linear embedding; Locality Preserving Projection;
fLanguage
English
Publisher
ieee
Conference_Titel
Database and Expert Systems Application, 2008. DEXA '08. 19th International Workshop on
Conference_Location
Turin
ISSN
1529-4188
Print_ISBN
978-0-7695-3299-8
Type
conf
DOI
10.1109/DEXA.2008.109
Filename
4624689
Link To Document