Title :
Separating Performance Anomalies from Workload-Explained Failures in Streaming Servers
Author :
Cunha, Carlos Augusto ; Silva, Luis Moura E
Abstract :
Video-streaming services are dominating the Internet, delivering content for video-on-demand, TV, education and collaborative work. Service parameters addressing quality and continuity of video content have a special importance due to the human sensitiveness to variations on video quality and decades of quality patterns absorbed by traditional TV users. Thus, the performance analysis and repair lifecycle at server and network levels is mandatory to avoid degradation of user experience. At the network level, there are several effective techniques based on temporal and spatial data redundancy, though they deeply depend on healthy servers with enough resources to afford both the client and recovery workloads. Excess of streaming workloads and performance anomalies (i.e., server resources exhaustion not explained by client requests) are typical causes of server performance failures. The former is often caused by memory caching of popular videos, which impacts the number of requests accepted by the server and consequently blurs load admittance mechanisms when the workload changes. The latter is caused by server internal factors independent of client workloads (e.g., memory leaks and maintenance activities). Separating client workload related failures from performance anomalies is mandatory for selection of immediate repair actions, capacity planning and to support fault repair. We evaluated the performance of Naive Bayes and C4.5 Trees algorithms for classification of these failure states using client and server performance metrics. Results shown that it is possible to predict the type of failure with levels of recall and accuracy higher than 90% for workload types with different popularity levels.
Keywords :
Bayes methods; Internet; client-server systems; content management; decision trees; digital video broadcasting; failure analysis; groupware; redundancy; security of data; spatiotemporal phenomena; video on demand; video servers; video streaming; C4.5 tree algorithm; Internet; Naive Bayes algorithm; TV; blur load admittance mechanism; client-server performance metrics; collaborative work; education; failure state classification; network level; performance anomalies; recovery workload; server level; server performance failure; spatial data redundancy; temporal data redundancy; video content delivery; video on demand; video quality; video streaming server; workload explained failure; Benchmark testing; Delay; Encoding; Maintenance engineering; Servers; Streaming media; Dependability; Performance Anomalies; Streaming;
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on
Conference_Location :
Ottawa, ON
Print_ISBN :
978-1-4673-1395-7
DOI :
10.1109/CCGrid.2012.58