Title :
E-VSM: Novel text representation model to capture contex-based closeness between two text documents
Author :
Bhakkad, Ankit ; Dharmadhikari, S.C. ; Emmanuel, M. ; Kulkarni, Parag
Author_Institution :
Dept. of IT, Pune Institute of Computer Technology, India
Abstract :
In many applications of Information Retrieval and Text Mining, there is need for an intelligent system to calculate the closeness between two text documents. In this, representation of text document in terms of mathematical object plays vital role. Vector Space Model is most popular method to represent text document in mathematical form but it is lossy, loses ordering of terms in text document in turn the context of it. Existing measures of closeness between two text documents are Cosine Similarity, Euclidean Distance etc. which are efficient but lacks in consideration of context of document. Through this paper we propose E-VSM: Enhanced-Vector Space Model to overcome limitations of original Vector Space Model and new ‘Density-based Clustering’ approach to calculate context-based closeness between two text documents which outperforms state of art in terms of accuracy. Experiments show good results specially when text document to be compared is very much close to a particular region of target text document.
Keywords :
Integrated optics; Noise; Optical imaging; Optical noise; Context-Based Closeness; Density-Based Clustering; Intelligent System; Vector Space Model;
Conference_Titel :
Intelligent Systems and Control (ISCO), 2013 7th International Conference on
Conference_Location :
Coimbatore, Tamil Nadu, India
Print_ISBN :
978-1-4673-4359-6
DOI :
10.1109/ISCO.2013.6481176