DocumentCode :
3539754
Title :
Hybrid neural network model for web document clustering
Author :
Hemalatha, M. ; Srinivas, D. Sathya
Author_Institution :
Dept of Comput. Sci., Karpagam Univ., Coimbatore, India
fYear :
2009
fDate :
4-6 Aug. 2009
Firstpage :
531
Lastpage :
538
Abstract :
The popularity of the Internet has caused a massive increase in the amount of Web pages. The information explosion has led to a growing challenge for information retrieval systems. Document clustering becomes an important process for helping the information retrieval systems organize this vast amount of data. It is believed that grouping similar documents together into clusters will help the users find relevant information quicker, and will allow them to focus their search in the appropriate direction. Feature selection is an important task in data analysis. It is useful to limit redundancy of features, promote comprehensibility, and find clusters (or structures) hidden in high dimensional data. This paper addresses the problems of document mining related with Web page clustering and classification using the principle component analysis for feature vector selection. Singular value decomposition is used to find the similarity measure and multilayer neural network used to improve the performance of the clustering algorithm. We illustrate and discuss the system performance by experimental evaluation results.
Keywords :
Internet; data analysis; data mining; document handling; feature extraction; information retrieval; information retrieval systems; learning (artificial intelligence); multilayer perceptrons; pattern classification; pattern clustering; principal component analysis; singular value decomposition; Internet; Web document clustering; Web page classification; data analysis; document mining; feature vector selection; information retrieval system; information search; machine learning; multilayer neural network; principle component analysis; similarity measure; singular value decomposition; Clustering algorithms; Data analysis; Explosions; Information retrieval; Internet; Multi-layer neural network; Neural networks; Singular value decomposition; System performance; Web pages; Multilayer Neural Network; Principle component Analysis; Singular Value Decomposition; Web document Clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Applications of Digital Information and Web Technologies, 2009. ICADIWT '09. Second International Conference on the
Conference_Location :
London
Print_ISBN :
978-1-4244-4456-4
Electronic_ISBN :
978-1-4244-4457-1
Type :
conf
DOI :
10.1109/ICADIWT.2009.5273918
Filename :
5273918
Link To Document :
بازگشت