• DocumentCode
    76379
  • Title

    Real-Time Semiparametric Regression for Distributed Data Sets

  • Author

    Luts, Jan

  • Author_Institution
    SearchParty.com, Surry Hills, NSW, Australia
  • Volume
    27
  • Issue
    2
  • fYear
    2015
  • fDate
    Feb. 1 2015
  • Firstpage
    545
  • Lastpage
    557
  • Abstract
    This paper proposes a method for semiparametric regression analysis of large-scale data which are distributed over multiple hosts. This enables modeling of nonlinear relationships and both the batch approach, where analysis starts after all data have been collected, and the real-time setting are addressed. The methodology is extended to operate in evolving environments, where it can no longer be assumed that model parameters remain constant overtime. Two areas of application for the methodology are presented: regression modeling when there are multiple data owners and regression modeling within the MapReduce framework. A website, realtime-semiparametric-regression.net, illustrates the use of the proposed method on United States domestic airline data in real-time.
  • Keywords
    data analysis; distributed databases; real-time systems; regression analysis; MapReduce framework; United States domestic airline data; batch approach; distributed data sets; large-scale data; multiple data owners; nonlinear relationships; real-time setting; semiparametric regression analysis; Adaptation models; Data models; Distributed databases; Organizations; Predictive models; Real-time systems; Vectors; Distributed learning; MapReduce; big data; data streams; evolving environments; real-time; semiparametric regression; variational Bayes;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2014.2334326
  • Filename
    6847147