• DocumentCode
    2830441
  • Title

    Semantically Rich Spaces for Document Clustering

  • Author

    Basili, Roberto ; Marocco, Paolo ; Milizia, Daniele

  • Author_Institution
    Dept. of Comput. Sci., Rome Univ., Rome
  • fYear
    2008
  • fDate
    1-5 Sept. 2008
  • Firstpage
    43
  • Lastpage
    47
  • Abstract
    Dimensionality reduction techniques address a relevant problem of vector space models that is the size of involved dictionaries. Certain geometrical transformations applied over the original feature space, like the latent semantic analysis (LSA), aim at preserving and discovering semantic relations between documents within small dimensional spaces. In this paper, a linear transformation method, named locality preserving projections (LPP), is evaluated with respect to a document clustering task and results are compared with LSA. LPP is here applied directly on the original space, through an efficient C-based implementation, and different parameterizations are investigated. Experimental results suggest that LPP is an effective technique able to account for the availability of a priori knowledge within an unsupervised learning framework.
  • Keywords
    data reduction; document handling; information retrieval; pattern clustering; unsupervised learning; dimensionality reduction technique; document clustering; geometrical transformation; information retrieval; latent semantic analysis; linear transformation method; locality preserving projection; semantic relation discovery; unsupervised learning framework; vector space model problem; Databases; Dictionaries; Expert systems; Functional analysis; Independent component analysis; Information retrieval; Large-scale systems; Linear discriminant analysis; Solid modeling; Vectors; Document clustering; Latent Semantic Analysis; Linear embedding; Locality Preserving Projection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Application, 2008. DEXA '08. 19th International Workshop on
  • Conference_Location
    Turin
  • ISSN
    1529-4188
  • Print_ISBN
    978-0-7695-3299-8
  • Type

    conf

  • DOI
    10.1109/DEXA.2008.109
  • Filename
    4624689