Agglomeration and Elimination of Terms for Dimensionality Reduction

Author

Ciarelli, Patrick Marques ; Oliveira, Elias

Author_Institution

Dept. of Electr. Eng., Univ. Fed. do Espirito Santo, Vitoria, Brazil

fYear

2009

fDate

Nov. 30 2009-Dec. 2 2009

Firstpage

547

Lastpage

552

Abstract

The vector space model is the usual representation of texts database for computational treatment. However, in such representation synonyms and/or related terms are treated as independent. Furthermore, there are some terms that do not add any information at all to the set of text documents, on the contrary they even might harm the performance of the information retrieval techniques. In an attempt to reduce this problem, some techniques have been proposed in the literature. In this work we present a method to tackle this problem. In order to validate our approach, we carried out a series of experiments on four databases and we compare the achieved results with other well known techniques. The evaluation results is such that our method obtained in all cases a better or equal performance compared to the other literature techniques.

Keywords

database management systems; information retrieval; text analysis; computational treatment; dimensionality reduction; information retrieval techniques; representation synonyms; text documents; texts database; vector space model; Costs; Data mining; Deductive databases; Feature extraction; Frequency; Information retrieval; Information science; Intelligent systems; Spatial databases; Text categorization; agglomeration of terms; dimensionality reduction; feature selection; text classification;

fLanguage

English

Publisher

ieee

Conference_Titel

Intelligent Systems Design and Applications, 2009. ISDA '09. Ninth International Conference on

Conference_Location

Pisa

Print_ISBN

978-1-4244-4735-0

Electronic_ISBN

978-0-7695-3872-3

Type

conf

DOI

10.1109/ISDA.2009.9

Filename

5364970