DocumentCode :
650481
Title :
Using Otsu´s Threshold Selection Method for Eliminating Terms in Vector Space Model Computation
Author :
Medeiros Eler, Danilo ; Garcia, Rogerio Eduardo
Author_Institution :
Dept. de Mat. e Comput., Univ. Estadual Paulista, Presidente Prudente, Brazil
fYear :
2013
fDate :
16-18 July 2013
Firstpage :
220
Lastpage :
226
Abstract :
Visualization techniques have proved to be valuable tools to support textual data exploration. Dimensionality reduction techniques have been widely used to produce visual representation of document collections. Focusing on multidimensional projection techniques, good visual results are produced depending on how representative terms to discriminate the documents are chosen to compose the vector space model (VSM). To define a good VSM it is necessary to apply filters during the preprocessing in order to eliminate terms using their frequency. For that, the user must evaluate the term frequency histogram based on his/her expertise in the text subject and decide the threshold value for frequency cut. Usually it is a trial and error approach that requires the user to verify the quality of visual representation after each trial. In this paper, we propose an automatic approach that applies the Otsu´s Threshold Selection Method for computing a threshold using a term frequency histogram. We conducted experiments that have shown our approach generates visual representations as good as those generated with a threshold obtained by trial and error approach. The contribution of our approach is that users with non expertise are able to generate good visual representations and the time to get a good threshold is decreased.
Keywords :
data mining; data visualisation; text analysis; Otsu threshold selection method; dimensionality reduction techniques; document collections; multidimensional projection techniques; term frequency histogram; textual data exploration; vector space model computation; visual representations; visualization techniques; Otsu´s Threshold Selection Method; Term Frequency Thresholding; Vector Space Model Computation; Visual Text Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Visualisation (IV), 2013 17th International Conference
Conference_Location :
London
ISSN :
1550-6037
Type :
conf
DOI :
10.1109/IV.2013.29
Filename :
6676566
Link To Document :
بازگشت