Abstract :
This paper advocates for the need to build a Microblogs Data Management System (MDMS) as an end-to-end data management system to support indexing, querying, and analyzing microblogs, e.g., Tweets, comments, or check-in´s. We identify a set of characteristics for microblogging environments that are distinguishing from any other data management environment. Then, we propose a system architecture for the first Microblogs Data Management System, which includes indexing, querying, and recovery components. The indexing component is responsible for indexing recent data in memory, indexing older data in disk, and synchronizing the flow of data from memory to disk without affecting the query response time. The querying component is responsible for retrieving the query answer from both memory and disk storage as well as employing online selectivity estimation techniques tuned to the behavior of microblogs data. The recovery module allows for efficiently storing and processing incoming microblogs in memory without worrying about data loss.