DocumentCode :
2727276
Title :
A Comparison of Dimensionality Reduction Techniques for Web Structure Mining
Author :
Chikhi, Nacim Fateh ; Rothenburger, Bernard ; Aussenac-Gilles, Nathalie
Author_Institution :
Univ. Paul Sabatier, Toulouse
fYear :
2007
fDate :
2-5 Nov. 2007
Firstpage :
116
Lastpage :
119
Abstract :
In many domains, dimensionality reduction techniques have been shown to be very effective for elucidating the underlying semantics of data. Thus, in this paper we investigate the use of various dimensionality reduction techniques (DRTs) to extract the implicit structures hidden in the Web hyperlink connectivity. We apply and compare four DRTs, namely, principal component analysis (PCA), non-negative matrix factorization (NMF), independent component analysis (ICA) and random projection (RP). Experiments conducted on three datasets allow us to assert the following: NMF outperforms PCA and ICA in terms of stability and interpretability of the discovered structures; the well- known WebKb dataset used in a large number of works about the analysis of the hyperlink connectivity seems to be not adapted for this task and we suggest rather to use the recent Wikipedia dataset which is better suited.
Keywords :
data mining; independent component analysis; matrix decomposition; principal component analysis; Web hyperlink connectivity; Web structure mining; Wikipedia dataset; dimensionality reduction; independent component analysis; non-negative matrix factorization; principal component analysis; random projection; Algorithm design and analysis; Data mining; Independent component analysis; Intelligent structures; Principal component analysis; Stability analysis; Topology; Web mining; Web pages; Wikipedia;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, IEEE/WIC/ACM International Conference on
Conference_Location :
Fremont, CA
Print_ISBN :
978-0-7695-3026-0
Type :
conf
DOI :
10.1109/WI.2007.86
Filename :
4427077
Link To Document :
بازگشت