Author_Institution :
Interdiscipl. Center for Sci. Comput. (IWR), Univ. of Heidelberg, Heidelberg, Germany
Abstract :
Many relationships naturally come in a bipartite setting: authors that write articles, proteins that interact with genes, or customers that buy, rent or rate products. Often we are interested in the clustering behavior of one side of the graph, i.e., in finding groups of similar articles or products. To find these clusters, a one-mode projection is classically applied, which results in a normal graph that can then be clustered by various methods. For data with strongly skewed degree distributions, a classical one-mode projection leads to very dense graphs with little information. In this article we propose a new method for a meaningful one-mode projection of any kind of bipartite graph B to a sparse general graph G, using a modified version of the so-called leverage. We provide ample experimental evidence that the method creates edges in G only between statistically significant neighbors and that the results are reliable and stable. For this, we present an output sensitive algorithm to compute Kendall´s τ. Moreover, for a subset of films in the Netflix prize data set, we can prove that the proposed method not only detects the statistically significantly co-rented films in the data set but that these are also the films that are the most similar ones by content. Thus, our method cannot only be used for the one-mode projection of bipartite graphs in general but also especially for any kind of market basket data to find pairs of most similar products as needed for, e.g., recommendation systems.
Keywords :
graph theory; pattern clustering; Netflix prize data; bipartite graphs; bipartite setting; clustering behavior; market basket data; one mode projection; recommendation systems; Approximation methods; Bipartite graph; Correlation; Data models; Quality assessment; Runtime; Stability analysis; bipartite graphs; network analysis; recommendation systems;