Abstract :
Increasingly more and more web applications are migrating to the cloud owing to higher scalability, low cost, and reduced time-to-market. For example, Amazon Web Services (AWS) hosts PBS, Reddit, Netflix, Zynga. Although the elasticity of cloud enables scaling, both up and down, a cluster in response to the incoming traffic, it makes performance modeling and analysis non-trivial. In the context of Java-based web applications, a key aspect is the performance of the garbage collector (GC). Existing tools for analyzing the performance of a GC are tailored for a single Java process, hence not suitable for use in the cloud. To this end, in this paper we present a tool called {bf Shrek} for analyzing GC performance in the cloud. {bf Shrek} facilitates analysis of GC logs of Java applications deployed across a cluster of hundreds of nodes in the cloud. Further, it supports analytics such as time series analysis of GC performance metrics to determine ``bad", nodes and supports visualization of, for example, promotion rate from the young generation to the old generation. {bf Shrek} has already been used to diagnose performance problems for multiple applications at Netflix.
Keywords :
Java; cloud computing; data visualisation; storage management; time series; GC performance analysis; GC performance metrics; Java-based Web applications; Netflix; Shrek; bad node determination; cloud; garbage collector; practical garbage collection analysis; time series analysis; visualization; Cloud computing; Hardware; Java; Time series analysis; Time-frequency analysis; Tuning; Cloud Performance; Garbage Collection; Time Series Analysis;