Title : 
Matrix-Query: A Distributed SQL-Like Query Processing Model for Large Database Clusters
         
        
            Author : 
Qiao Liu ; Ping Ji ; Yuan Zuo
         
        
            Author_Institution : 
Dept. of Comput. Sci. & Eng., BeiHang Univ., Beijing, China
         
        
        
        
        
        
            Abstract : 
Along with the development of distributed computation and the rapid growth of data, scientific research increasingly requires the support of high-efficiency relational data processing framework. According to the characteristics of scientific data, for example bulk inserts and unfrequented change, this paper proposes a streaming processing model called Matrix-Query with the matching data storage architecture for relational query. Through transforming the original relational schema to entities and key-value indexing, the data storage solution provides more localization operation and data positioning. Compare to traditional Map-Reduce model, the Matrix-Query isolates the influence between subtasks to ensure execution in a streaming and parallel manner and reduces negative impacts of writing intermediate file. We also optimize the data structure and subtask management to improve the performance of Matrix-Query. The experimental results demonstrate performance advantage of Matrix-query compared to two famous data processing systems, Hive and HadoopDB, which build on the top of Map-Reduce model.
         
        
            Keywords : 
SQL; database indexing; distributed databases; natural sciences computing; query processing; relational databases; very large databases; HadoopDB; Hive; Map-Reduce model; Matrix-Query; bulk insert; data positioning; data processing system; data storage architecture; data structure optimization; distributed SQL-like query processing model; distributed computation; high-efficiency relational data processing framework; key-value indexing; large database clusters; localization operation; relational query; relational schema; scientific data; streaming processing model; subtask management; Computational modeling; Data models; Distributed databases; Indexing; Memory; Query processing; SQL; distributed computation; relational query processing model;
         
        
        
        
            Conference_Titel : 
Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2013 International Conference on
         
        
            Conference_Location : 
Beijing
         
        
        
            DOI : 
10.1109/CyberC.2013.36