Abstract :
In this paper, a system named as DisGR, for Distributed Graph Repository, that is designed and developed for supporting Chinese Web related research, is introduced. The system is designed based on a graph data model, TGM (for Tagged Graph Model), that is designed for representing Web data, especially forum and BBS data. DisGR supports the query language TGM-L that aims at analytical tasks for TGM data. For high-scalability and availability purpose, DisGR is designed for clusters with shared-nothing architecture. DisGR has several characteristics such as column-based storage, descriptive language support, and flexible user-defined function support. DisGR is different to other database systems with similar purpose in three perspectives. First, catalog is maintained by a set of servers connected via a DHT overlay. Second, signatures with different granularities are used for data distribution and query optimization. Last but not the least, update is supported via timestamps and regularily reorganization.
Keywords :
Internet; data handling; interactive systems; storage management; data intensive Web application; data prefetching; index support; index-based data access; interactive Web application; massive web data management; pipelined data processing; storage framework; Catalogs; Data models; Graphical models; Web search;