DocumentCode
3106328
Title
Mining Generalized Graph Patterns Based on User Examples
Author
Dmitriev, Pavel ; Lagoze, Carl
Author_Institution
Dept. of Comput. Sci., Cornell Univ., Ithaca, NY
fYear
2006
fDate
18-22 Dec. 2006
Firstpage
857
Lastpage
862
Abstract
There has been a lot of recent interest in mining patterns from graphs. Often, the exact structure of the patterns of interest is not known. This happens, for example, when molecular structures are mined to discover fragments useful as features in chemical compound classification task, or when web sites are mined to discover sets of web pages representing logical documents. Such patterns are often generated from a few small subgraphs (cores), according to certain generalization rules (GRs). We call such patterns "generalized patterns "(GPs). While being structurally different, GPs often perform the same function in the network. Previously proposed approaches to mining GPs either assumed that the cores and the GRs are given, or that all interesting GPs are frequent. These are strong assumptions, which often do not hold in practical applications. In this paper, we propose an approach to mining GPs that is free from the above assumptions. Given a small number of GPs selected by the user, our algorithm discovers all GPs similar to the user examples. First, a machine learning-style approach is used to find the cores. Second, generalizations of the cores in the graph are computed to identify GPs. Evaluation on synthetic data, generated using real cores and GRs from biological and web domains, demonstrates effectiveness of our approach.
Keywords
data mining; graph theory; learning (artificial intelligence); Web pages; Web sites; chemical compound classification task; generalization rules; generalized patterns; graph patterns mining; logical documents; machine learning; patterns structure; Biology computing; Chemical compounds; Chemical technology; Computer science; Data mining; Evolution (biology); HTML; Machine learning algorithms; Proteins; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2006. ICDM '06. Sixth International Conference on
Conference_Location
Hong Kong
ISSN
1550-4786
Print_ISBN
0-7695-2701-7
Type
conf
DOI
10.1109/ICDM.2006.108
Filename
4053116
Link To Document