Abstract :
Based on the characteristics of HBase, data in the table is automatically sorted according to Rowkey, so in the organization of the massive data from community, add a timestamp to the storage structure in order to speed up queries, but HBase region split causes a defect that HBase load imbalance. In view of the above problems, this paper presents the design ideas of pre-partitioning and hash. in advance, according to the data characteristics, the cluster is divided into several regions, then through Rowkey hash mapping data is stored evenly to each partition. The data is stored equal probability to each region can not only solve the problem that a single node overload and some nodes waste of resources, but also avoid pressure on single-node query. Practice shows that the pre-partitioning and hash storage mechanisms can effectively optimize the problem that HBase load imbalance, caused by the storage of the massive data from community.