DocumentCode :
1791572
Title :
In-advance data analytics for reducing time to discovery
Author :
Jialin Liu ; Yin Lu ; Yong Chen
Author_Institution :
Dept. of Comput. Sci., Texas Tech Univ., Lubbock, TX, USA
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
329
Lastpage :
334
Abstract :
Scientific workflow involves data generation, data analysis, and knowledge discovery. As the data volume exceeds a few terabytes (TB) in a single simulation run, the data movement, which happens among data generation, data analysis, and knowledge discovery, becomes a bottleneck in most scientific big data applications. Our previous work shows that reusing the analysis results can have a significant potential in reducing the overlap between data movement among compute nodes and storage nodes. In this work, we propose a new in-advance data analytics method to augment the result reuse. The fundamental idea of this in-advance data analytics method and its prototyping system is to predict the potential useful analytics operations by studying the users´ analysis pattern. The predicted analysis operation is pro-actively performed on existing data and the analysis results are stored in an in-memory database for result reuse. The evaluation shows that the in-advance data analytics method and its prototyping system gains 1.2X-6.1X speedup in I/O performance improvement with 50% data overlapping and 10%-100% operation recommendation hit rate. The proposed in-advance data analytics method brings a new promising data reduction solution for big data applications.
Keywords :
Big Data; data analysis; data reduction; big data applications; data overlapping; data reduction solution; discovery time reduction; in-advance data analytics method; in-memory database; operation recommendation hit rate; prototyping system; users analysis pattern; Big data; Data analysis; Data models; Databases; Kernel; Markov processes; Meteorology; big data; data intensive computing; in-advance data analytics; scientific computing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004249
Filename :
7004249
Link To Document :
بازگشت