Title :
Modeling randomized data streams in caching, data processing, and crawling applications
Author :
Ahmed, Sarker Tanzir ; Loguinov, Dmitri
Author_Institution :
Texas A&M Univ., College Station, TX, USA
fDate :
April 26 2015-May 1 2015
Abstract :
Many BigData applications (e.g., MapReduce, web caching, search in large graphs) process streams of random key-value records that follow highly skewed frequency distributions. In this work, we first develop stochastic models for the probability to encounter unique keys during exploration of such streams and their growth rate over time. We then apply these models to the analysis of LRU caching, MapReduce overhead, and various crawl properties (e.g., node-degree bias, frontier size) in random graphs.
Keywords :
Big Data; cache storage; information retrieval; parallel processing; stochastic processes; Big Data applications; LRU caching; MapReduce overhead; caching application; crawl properties; crawling application; data processing; frequency distribution; probability; random graphs; randomized data streams; stochastic model; Analytical models; Computational modeling; Computers; Conferences; Random variables; Stochastic processes; Yttrium;
Conference_Titel :
Computer Communications (INFOCOM), 2015 IEEE Conference on
Conference_Location :
Kowloon
DOI :
10.1109/INFOCOM.2015.7218542