DocumentCode
2844104
Title
Agglomeration and Elimination of Terms for Dimensionality Reduction
Author
Ciarelli, Patrick Marques ; Oliveira, Elias
Author_Institution
Dept. of Electr. Eng., Univ. Fed. do Espirito Santo, Vitoria, Brazil
fYear
2009
fDate
Nov. 30 2009-Dec. 2 2009
Firstpage
547
Lastpage
552
Abstract
The vector space model is the usual representation of texts database for computational treatment. However, in such representation synonyms and/or related terms are treated as independent. Furthermore, there are some terms that do not add any information at all to the set of text documents, on the contrary they even might harm the performance of the information retrieval techniques. In an attempt to reduce this problem, some techniques have been proposed in the literature. In this work we present a method to tackle this problem. In order to validate our approach, we carried out a series of experiments on four databases and we compare the achieved results with other well known techniques. The evaluation results is such that our method obtained in all cases a better or equal performance compared to the other literature techniques.
Keywords
database management systems; information retrieval; text analysis; computational treatment; dimensionality reduction; information retrieval techniques; representation synonyms; text documents; texts database; vector space model; Costs; Data mining; Deductive databases; Feature extraction; Frequency; Information retrieval; Information science; Intelligent systems; Spatial databases; Text categorization; agglomeration of terms; dimensionality reduction; feature selection; text classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Systems Design and Applications, 2009. ISDA '09. Ninth International Conference on
Conference_Location
Pisa
Print_ISBN
978-1-4244-4735-0
Electronic_ISBN
978-0-7695-3872-3
Type
conf
DOI
10.1109/ISDA.2009.9
Filename
5364970
Link To Document