DocumentCode
2864734
Title
Higher-order Web link analysis using multilinear algebra
Author
Kolda, Tamara G. ; Bader, Brett W. ; Kenny, Joseph P.
Author_Institution
Sandia Nat. Labs., Livermore, CA, USA
fYear
2005
fDate
27-30 Nov. 2005
Abstract
Linear algebra is a powerful and proven tool in Web search. Techniques, such as the PageRank algorithm of Brin and Page and the HITS algorithm of Kleinberg, score Web pages based on the principal eigenvector (or singular vector) of a particular non-negative matrix that captures the hyperlink structure of the Web graph. We propose and test a new methodology that uses multilinear algebra to elicit more information from a higher-order representation of the hyperlink graph. We start by labeling the edges in our graph with the anchor text of the hyperlinks so that the associated linear algebra representation is a sparse, three-way tensor. The first two dimensions of the tensor represent the Web pages while the third dimension adds the anchor text. We then use the rank-1 factors of a multilinear PARAFAC tensor decomposition, which are akin to singular vectors of the SVD, to automatically identify topics in the collection along with the associated authoritative Web pages.
Keywords
Internet; linear algebra; text analysis; HITS algorithm; PageRank algorithm; Web graph; Web search; anchor text; higher-order Web link analysis; higher-order representation; hyperlink graph; hyperlink structure; multilinear PARAFAC tensor decomposition; multilinear algebra; nonnegative matrix; principal eigenvector; score Web pages; singular vector; sparse three-way tensor; Labeling; Laboratories; Linear algebra; Search engines; Tensile stress; Testing; Topology; Vectors; Web pages; Web search;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, Fifth IEEE International Conference on
ISSN
1550-4786
Print_ISBN
0-7695-2278-5
Type
conf
DOI
10.1109/ICDM.2005.77
Filename
1565685
Link To Document