Title :
Real-Time Semiparametric Regression for Distributed Data Sets
Author_Institution :
SearchParty.com, Surry Hills, NSW, Australia
Abstract :
This paper proposes a method for semiparametric regression analysis of large-scale data which are distributed over multiple hosts. This enables modeling of nonlinear relationships and both the batch approach, where analysis starts after all data have been collected, and the real-time setting are addressed. The methodology is extended to operate in evolving environments, where it can no longer be assumed that model parameters remain constant overtime. Two areas of application for the methodology are presented: regression modeling when there are multiple data owners and regression modeling within the MapReduce framework. A website, realtime-semiparametric-regression.net, illustrates the use of the proposed method on United States domestic airline data in real-time.
Keywords :
data analysis; distributed databases; real-time systems; regression analysis; MapReduce framework; United States domestic airline data; batch approach; distributed data sets; large-scale data; multiple data owners; nonlinear relationships; real-time setting; semiparametric regression analysis; Adaptation models; Data models; Distributed databases; Organizations; Predictive models; Real-time systems; Vectors; Distributed learning; MapReduce; big data; data streams; evolving environments; real-time; semiparametric regression; variational Bayes;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2014.2334326