Title :
Cleaning Framework for Big Data - Object Identification and Linkage
Author :
Hong Liu ; Ashwin Kumar, T.K. ; Thomas, Johnson P.
Author_Institution :
Dept. of Comput. Sci., Oklahoma State Univ., Stillwater, OK, USA
Abstract :
Data is a valuable resource. The proper use of high-quality data can help make better predictions, analysis and decisions. Poor-quality data is detrimental to data analytics. Data from different sources may provide the same entities, but different identities. This becomes a concern particularly when large-scale heterogeneous data from multiple sources are integrated for other purposes. This paper aims to identify same or similar objects and link these associated objects together so that the data can be cleaned and combined efficiently. Our research harnesses both context and usage patterns of data items to determine relationships among objects. Our experimental results show that efficient linkage among multiple sources can be constructed using context and usage patterns.
Keywords :
Big Data; data analysis; pattern classification; Big Data; cleaning framework; context pattern; data analytics; object identification; object linkage; usage pattern; Cleaning; Context; Couplings; Generators; Markov processes; Object recognition; Data Cleaning; Data Context; Object Identification; Object Linkage; Usage Patterns;
Conference_Titel :
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location :
New York, NY
Print_ISBN :
978-1-4673-7277-0
DOI :
10.1109/BigDataCongress.2015.38