DocumentCode :
76379
Title :
Real-Time Semiparametric Regression for Distributed Data Sets
Author :
Luts, Jan
Author_Institution :
SearchParty.com, Surry Hills, NSW, Australia
Volume :
27
Issue :
2
fYear :
2015
fDate :
Feb. 1 2015
Firstpage :
545
Lastpage :
557
Abstract :
This paper proposes a method for semiparametric regression analysis of large-scale data which are distributed over multiple hosts. This enables modeling of nonlinear relationships and both the batch approach, where analysis starts after all data have been collected, and the real-time setting are addressed. The methodology is extended to operate in evolving environments, where it can no longer be assumed that model parameters remain constant overtime. Two areas of application for the methodology are presented: regression modeling when there are multiple data owners and regression modeling within the MapReduce framework. A website, realtime-semiparametric-regression.net, illustrates the use of the proposed method on United States domestic airline data in real-time.
Keywords :
data analysis; distributed databases; real-time systems; regression analysis; MapReduce framework; United States domestic airline data; batch approach; distributed data sets; large-scale data; multiple data owners; nonlinear relationships; real-time setting; semiparametric regression analysis; Adaptation models; Data models; Distributed databases; Organizations; Predictive models; Real-time systems; Vectors; Distributed learning; MapReduce; big data; data streams; evolving environments; real-time; semiparametric regression; variational Bayes;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2014.2334326
Filename :
6847147
Link To Document :
بازگشت