DocumentCode :
1974292
Title :
Self-organising map for document categorization using latent semantic analysis
Author :
Mahalakshmi, B. ; Duraiswamy, K.
Author_Institution :
Dept. of CSE, Anna Univ. Coimbatore, Tiruchengode, India
fYear :
2010
fDate :
12-13 Feb. 2010
Firstpage :
1
Lastpage :
6
Abstract :
With the increasing amount of unstructured content available electronically on the web, content categorization becomes very important for efficient information retrieval. The basic approaches for information retrieval in text documents are searching using keywords, categorization of the documents and filtering out the stream. To extract information from raw data, its complexity needed to be first reduced. Clustering methods and Projection methods are aimed at reducing the amount of data and dimensionality of data respectively. SOM is a special case in that it can be used at the same time for both clustering and projection. It projects onto a 2D-grid. Various methods were developed for the automatic clustering of worldwide webdocuments according to the user requirements. The objective of this paper is to reduce the time and effort the user has to find the information sought after. The method termed topological organization of content can generate classified topics from a set of unstructured documents. The TOC is a set of hierarchically organized 1D-growing SOMs. In TOC, vector space model is used for indexing of 1D-SOM. In the proposed approach, latent semantic indexing of 1D-SOM can be used to enhance the association between terms. Latent semantic analysis is a technique that projects the original high dimensional document vector into a space with latent semantic dimensions. A term-by-document matrix is constructed for the information retrieval. A brief review is given on existing methods for documents clustering and organization. The proposed method which can use LSI will be efficient in terms of computational cost, accuracy and visualization. It can be easily adapted for large data set. The proposed method will provide feature for retrieving meaningful related topics.
Keywords :
document handling; information retrieval; pattern clustering; self-organising feature maps; 2D grid; LSI; clustering methods; document categorization; high dimensional document vector; information retrieval; latent semantic analysis; projection methods; self organising map; text documents; topological content organization; unstructured content; worldwide Web documents; Clustering methods; Computational efficiency; Content based retrieval; Data mining; Functional analysis; Indexing; Information filtering; Information filters; Information retrieval; Large scale integration; Document categorization; information retrieval; latent semantic analysis(LSA); one-dimensional self-organizing map (1D-SOM);
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Innovative Computing Technologies (ICICT), 2010 International Conference on
Conference_Location :
Tamil Nadu
Print_ISBN :
978-1-4244-6488-3
Type :
conf
DOI :
10.1109/ICINNOVCT.2010.5440089
Filename :
5440089
Link To Document :
بازگشت