DocumentCode :
3657038
Title :
Business data fusion
Author :
Surya Yadav;Gautam Shroff;Ehtesham Hassan;Puneet Agarwal
Author_Institution :
TCS Innovation Labs, New Delhi, India
fYear :
2015
fDate :
7/1/2015 12:00:00 AM
Firstpage :
1876
Lastpage :
1885
Abstract :
Enterprise business intelligence usually relies on data from multiple sources being carefully joined based on common attributes and consolidated into a common data warehouse. This process is often plagued by difficulties and errors in resolving join-attributes across sources while consolidating information into a data warehouse. Moreover, it may often be impossible to accurately join data from diverse external data sources. Nevertheless, each such data source can still provide useful information on correlations amongst the attributes it captures, and enterprises are increasingly looking to replace the traditional data warehouse with `data lakes´ based on new technology, such as Hadoop, in order to derive statistical insights. We describe an approach for `business data fusion´ applicable in such a scenario: We define `distributional queries´ and their utility in multiple scenarios, including for correlating diverse data sources, and show that these are equivalent to probabilistic inference. In order to efficiently execute such queries, relationships and correlations across data sources are summarized via a Bayesian network, which is learned in an expert-guided manner so as to incorporate domain knowledge. We present empirical results of our approach applied to (a) summarize large volumes of vehicular multi-sensor data in a sensor-data-lake, to efficiently provide probabilistic answers to support engineering analysis without repeatedly accessing the raw data; and (b) demonstrate how potentially diverse and unrelated public and private data sources can nevertheless be approximately and efficiently joined to derive useful statistical insights via distributional queries implemented using Bayesian inference.
Keywords :
"Bayes methods","Probabilistic logic","Databases","Data integration","Approximation methods","Joints","Business"
Publisher :
ieee
Conference_Titel :
Information Fusion (Fusion), 2015 18th International Conference on
Type :
conf
Filename :
7266784
Link To Document :
بازگشت