• DocumentCode
    2844104
  • Title

    Agglomeration and Elimination of Terms for Dimensionality Reduction

  • Author

    Ciarelli, Patrick Marques ; Oliveira, Elias

  • Author_Institution
    Dept. of Electr. Eng., Univ. Fed. do Espirito Santo, Vitoria, Brazil
  • fYear
    2009
  • fDate
    Nov. 30 2009-Dec. 2 2009
  • Firstpage
    547
  • Lastpage
    552
  • Abstract
    The vector space model is the usual representation of texts database for computational treatment. However, in such representation synonyms and/or related terms are treated as independent. Furthermore, there are some terms that do not add any information at all to the set of text documents, on the contrary they even might harm the performance of the information retrieval techniques. In an attempt to reduce this problem, some techniques have been proposed in the literature. In this work we present a method to tackle this problem. In order to validate our approach, we carried out a series of experiments on four databases and we compare the achieved results with other well known techniques. The evaluation results is such that our method obtained in all cases a better or equal performance compared to the other literature techniques.
  • Keywords
    database management systems; information retrieval; text analysis; computational treatment; dimensionality reduction; information retrieval techniques; representation synonyms; text documents; texts database; vector space model; Costs; Data mining; Deductive databases; Feature extraction; Frequency; Information retrieval; Information science; Intelligent systems; Spatial databases; Text categorization; agglomeration of terms; dimensionality reduction; feature selection; text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems Design and Applications, 2009. ISDA '09. Ninth International Conference on
  • Conference_Location
    Pisa
  • Print_ISBN
    978-1-4244-4735-0
  • Electronic_ISBN
    978-0-7695-3872-3
  • Type

    conf

  • DOI
    10.1109/ISDA.2009.9
  • Filename
    5364970