Title : 
Incremental Sorting for Large Dynamic Data Sets
         
        
            Author : 
Aydin, Ahmet Arif ; Anderson, Kenneth M.
         
        
            Author_Institution : 
Dept. of Comput. Sci., Univ. of Colorado, Boulder, CO, USA
         
        
        
            fDate : 
March 30 2015-April 2 2015
         
        
        
        
            Abstract : 
In today´s world of pervasive computing, it is straightforward for organizations to generate large amounts of data in support of a variety of business needs. For this reason, it is important to build tools that allow analysts to manage and investigate these data sets quickly and efficiently. One feature needed by these tools is the ability to sort large amounts of data along a number of dimensions to facilitate the search for useful information. In this paper, we describe a new method for incrementally sorting large, multi-dimensional, dynamic data sets. Our particular use case involves sorting large Twitter data sets but our technique can be applied more generally across a variety of data types. Our approach is evaluated with respect to its scalability and by comparing it to several alternatives. It is currently able to efficiently sort data sets consisting of tens of millions of tweets along a variety of dimensions even when the data set is under active collection and new tweets are being added each day. The approach incrementally integrates the new tweets and provides sorted views of all tweets along various dimensions without having to re-sort the previously sorted tweets. The paper presents the benefits of the technique, discusses its limitations, and describes its software engineering contributions.
         
        
            Keywords : 
business data processing; social networking (online); software engineering; sorting; ubiquitous computing; Twitter data; business needs; incremental sorting; large dynamic data sets; multidimensional dynamic data sets; organizations; pervasive computing; software engineering contributions; sorted tweets; Browsers; Data analysis; Indexes; Scalability; Sorting; Twitter; big data; dynamic data sets; incremental sorting;
         
        
        
        
            Conference_Titel : 
Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on
         
        
            Conference_Location : 
Redwood City, CA
         
        
        
            DOI : 
10.1109/BigDataService.2015.35