Title :
Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks
Author :
Baoxu Shi ; Weninger, Tim
Author_Institution :
Comput. Sci. & Eng., Univ. of Notre Dame, Notre Dame, IN, USA
Abstract :
Meta-paths in heterogeneous information networks are almost always hand created and have, so far, only been attempted on data sets with very small type systems like DBLP, IMDB, etc. Most real-world heterogeneous information networks have large and complex type systems. As the size and complexity of the type-system grows it becomes more and more difficult for humans to form reasonable meta-path queries. This work introduces a new technique to discover a new market for data called interesting meta-paths from complex heterogeneous information networks. Our interestingness measure is based on classical knowledge discovery principles, but have been applied in such a way that only interesting meta-paths are mined from the hundreds-of-thousands of possible choices. As in classical pattern mining literature, precision and recall statistics are difficult to obtain, instead we evaluate the effectiveness of our results using a quantitative node-similarity analysis as well as a large user study. Finally, we apply the newly discovered interesting meta-paths to find similar nodes on the Wikipedia heterogeneous information networks.
Keywords :
Web sites; complex networks; data mining; information networks; pattern clustering; query processing; statistics; DBLP; IMDB; Wikipedia; complex heterogeneous information networks; complex type systems; knowledge discovery; meta-path queries; meta-paths mining; pattern mining; recall statistics; Data mining; Educational institutions; Electronic publishing; Encyclopedias; Gold; Internet; information networks; meta-paths; similarity;
Conference_Titel :
Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4799-4275-6
DOI :
10.1109/ICDMW.2014.25