Title :
Provenance-based Indexing Support in Micro-blog Platforms
Author :
Yao, Junjie ; Cui, Bin ; Xue, Zijun ; Liu, Qingyun
Author_Institution :
Dept. of Comput. Sci. & Technol., Peking Univ., Beijing, China
Abstract :
Recently, lots of micro-blog message sharing applications have emerged on the web. Users can publish short messages freely and get notified by the subscriptions instantly. Prominent examples include Twitter, Facebook´s statuses, and Sina Weibo in China. The Micro-blog platform becomes a useful service for real time information creation and propagation. However, these messages´ short length and dynamic characters have posed great challenges for effective content understanding. Additionally, the noise and fragments make it difficult to discover the temporal propagation trail to explore development of micro-blog messages. In this paper, we propose a provenance model to capture connections between micro-blog messages. Provenance refers to data origin identification and transformation logging, demonstrating of great value in recent database and workflow systems. To cope with the real time micro-message deluge, we utilize a novel message grouping approach to encode and maintain the provenance information. Furthermore, we adopt a summary index and several adaptive pruning strategies to implement efficient provenance updating. Based on the index, our provenance solution can support rich query retrieval and intuitive message tracking for effective message organization. Experiments conducted on a real dataset verify the effectiveness and efficiency of our approach. Provenance refers to data origin identification and transformation monitoring, which has been demonstrated of great value in database and workflow systems. In this paper, we propose a provenance model in micro-blog platforms, and design an indexing scheme to support provenance-based message discovery and maintenance, which can capture the interactions of messages for effective message organization. To cope with the real time micro-message tornadoes, we introduce a novel virtual annotation grouping approach to encode and maintain the provenance information. Furthermore, we design a summary index and adaptive prun- ng strategies to facilitate efficient message update. Based on this provenance index, our approach can support query and message tracking in micro-blog systems. Experiments conducted on real datasets verify the effectiveness and efficiency of our approach.
Keywords :
Internet; electronic messaging; indexing; query processing; social networking (online); China; Facebook statuses; Sina Weibo; Twitter; World Wide Web; adaptive pruning strategies; data origin identification; database; message grouping approach; message organization; micro-blog message sharing applications; micro-blog messages; micro-blog platforms; provenance-based indexing support; real time information creation; real time information propagation; real time micro-message deluge; rich query retrieval; short messages; support provenance-based message discovery; temporal propagation trail; transformation logging; transformation monitoring; workflow systems; Blogs; Context; Indexing; Media; Noise; Twitter;
Conference_Titel :
Data Engineering (ICDE), 2012 IEEE 28th International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4673-0042-1
DOI :
10.1109/ICDE.2012.36