Title : 
Cost and data exploration considerations for big data prediction on the cloud
         
        
            Author : 
Chris Tseng;Tien Nguyen;Chetan Sharma
         
        
            Author_Institution : 
Computer Science Dept., San Jose State University
         
        
        
        
        
            Abstract : 
Cloud services allow one to perform intense big data calculations without having to own personally a powerful enough machine. Different cloud-based virtual machines, however, offer different processor speeds at different costs, and the most cost-effective machine size may not always be obvious. We investigated different virtual machine sizes on the Microsoft Azure cloud service and also different data exploration methodologies to solve a big data prediction project using Neural Networks. It was found that one may not always get proportionally better performance with higher end expensive virtual machine settings. Direct application of Neural Network on prediction problem typically has a bottleneck in performance. We found the learning and prediction can be made better with data properties and problem nature taken into consideration. Some of our data preparation schemes will be useful for general big data prediction problem with noise or non-uniformly distributed data.
         
        
            Keywords : 
"Neural networks","Training","Big data","Virtual machining","Cloud computing","Hardware","Performance analysis"
         
        
        
            Conference_Titel : 
Big Data (Big Data), 2015 IEEE International Conference on
         
        
        
            DOI : 
10.1109/BigData.2015.7363930