DocumentCode :
2504989
Title :
A High-Dimensional Access Method for Approximated Similarity Search in Text Mining
Author :
Artigas-Fuentes, F. ; Gil-García, R. ; Badía-Contelles, J.M.
Author_Institution :
CERPAMID, Univ. de Oriente, Santiago de Cuba, Cuba
fYear :
2010
fDate :
23-26 Aug. 2010
Firstpage :
3155
Lastpage :
3158
Abstract :
In this paper, a new access method for very high-dimensional data space is proposed. The method uses a graph structure and pivots for indexing objects, such as documents in text mining. It also applies a simple search algorithm that uses distance or similarity based functions in order to obtain the k-nearest neighbors for novel query objects. This method shows a good selectivity over very-high dimensional data spaces, and a better performance than other state-of-the-art methods. Although it is a probabilistic method, it shows a low error rate. The method is evaluated on data sets from the well-known collection Reuters corpus version 1 (RCV1-v2) and dealing with thousands of dimensions.
Keywords :
data mining; graph theory; indexing; information retrieval; probability; text analysis; distance based function; document indexing; graph structure; high-dimensional access method; k-nearest neighbor; object indexing; probabilistic method; similarity based function; similarity search; text mining; very high-dimensional data space; Artificial neural networks; Indexing; Search problems; Text mining; Training; access method; approximated search; high-dimensional spaces; similarity search; text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location :
Istanbul
ISSN :
1051-4651
Print_ISBN :
978-1-4244-7542-1
Type :
conf
DOI :
10.1109/ICPR.2010.772
Filename :
5597302
Link To Document :
بازگشت