DocumentCode
633068
Title
Learning Classifiers from Chains of Multiple Interlinked RDF Data Stores
Author
Lin, H.T. ; Honavar, V.
Author_Institution
Dept. of Comput. Sci., Iowa State Univ., Ames, IA, USA
fYear
2013
fDate
June 27 2013-July 2 2013
Firstpage
94
Lastpage
101
Abstract
The emergence of many interlinked, physically distributed, and autonomously maintained RDF stores offers unprecedented opportunities for predictive modeling and knowledge discovery from such data. However existing machine learning approaches are limited in their applicability because it is neither desirable nor feasible to gather all of the data in a centralized location for analysis due to access, memory, bandwidth, computational restrictions, and sometimes privacy and confidentiality constraints. Against this background, we consider the problem of learning predictive models from multiple interlinked RDF stores. Specifically we: (i) introduce statistical query based formulations of several representative algorithms for learning classifiers from RDF data, (ii) introduce a distributed learning framework to learn classifiers from multiple interlinked RDF stores that form a chain, (iii) identify three special cases of RDF data fragmentation and describe effective strategies for learning predictive models in each case, (iv) consider a novel application of a matrix reconstruction technique from the field of Computerized Tomography [1] to approximate the statistics needed by the learning algorithm from projections using count queries, thus dramatically reducing the amount of information transmitted from the remote data sources to the learner, and (v) report results of experiments with a real-world social network data set (Last.fm), which demonstrate the feasibility of the proposed approach.
Keywords
data mining; distributed processing; learning (artificial intelligence); matrix algebra; pattern classification; query processing; RDF data fragmentation; computerized tomography; distributed learning framework; knowledge discovery; learning classifiers; learning predictive models; machine learning; matrix reconstruction technique; multiple interlinked RDF data stores; predictive modeling; remote data sources; statistical query based formulations; Computed tomography; Distributed databases; Image reconstruction; Predictive models; Resource description framework; Subspace constraints; Vectors; RDF; SPARQL; classifier; distributed learning; linked data; supervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (BigData Congress), 2013 IEEE International Congress on
Conference_Location
Santa Clara, CA
Print_ISBN
978-0-7695-5006-0
Type
conf
DOI
10.1109/BigData.Congress.2013.22
Filename
6597124
Link To Document