Title :
Optimize datacenter management with multi-tier thermal-intelligent workload placement
Author :
Chuan Song ; Chun Wang ; Ahuja, Nishi ; Xiang Zhou ; Daniel, Abishai
Author_Institution :
Intel Corp., Shanghai, China
Abstract :
Rapid growth of internet services and mobile devices has led to more and larger cloud data centers. The hyper scale cloud data center consumes enormous amount of electricity and cause pressure to operation cost and infrastructure management. The industry has made great progress in improving power usage effectiveness through innovation and infrastructure upgrade. Recent research is focusing on dynamically adjusting workload placement according to realtime power and thermal telemetry of datacenter infrastructure to reduce the pressure to datacenter power and thermal as well as to improve datacenter power usage effectiveness (PUE), like thermal awareness scheduler (TAS). To achieve higher density and longer lifecycle of high value computing component, the conventional rack-mount server system is evolving to rack scale server system with power and cooling units moving to rack level to share with multi server systems, like Facebook-led Open Compute project and Project Scorpio developed by Chinese internet giants - Baidu, Alibaba and Tencent. The traditional thermal awareness workload placement assumes all server systems within clusters are uniform with discrete power units and cooling units, and there are no power and thermal correlation between different server systems. However, with power and cooling units moving to rack level, the power and thermal correlation between different server systems must be considered while calculating the optimal workload placement. To address these challenges, in this paper, we propose a framework of multi-tiers thermalintelligent workload placement and corresponding thermal management algorithms accustomed to rack scale server systems with shared power and cooling units. This paper evaluated these thermal management algorithms from performance, benefits, as well as their usage scenarios. The prototype and experiment introduced in this paper run over OpenStack managed cluster, but the thermal-intelligent workload placement and correspondi- g thermal management policies introduced by this paper aim to provide one common framework in addition to Cloud OS, like Big data software stacks, even customer´s distributed computing and storage systems.
Keywords :
cloud computing; cooling; costing; electronic engineering computing; thermal analysis; thermal management (packaging); Alibaba; Baidu; Chinese Internet giants; Internet services; OpenStack managed cluster; PUE; TAS; Tencent; big data software stacks; cloud OS; customer distributed computing; discrete power and cooling units; electricity; facebook-led open compute project scorpio; high value computing component; hyper scale cloud data center; infrastructure management; innovation upgrade; mobile devices; multi server systems; multitier thermal-intelligent workload placement; operation cost; optimize data center management; power usage effectiveness improvement; pressure reduction; rack scale server system; rack-mount server system; real-time power; storage systems; thermal awareness scheduler; thermal correlation; thermal management algorithms; thermal telemetry; Cooling; Matched filters; Servers; Software; Temperature; Temperature sensors; Thermal management; OpenStack; PTAS (Power and Thermal Awareness Solution); SDI (software defined infrastructure); Thermal Management; cloud computing; rack scale server; thermal awareness scheduling; thermal intelligence; workload provision and consolidation;
Conference_Titel :
Thermal Measurement, Modeling & Management Symposium (SEMI-THERM), 2015 31st
Conference_Location :
San Jose, CA
DOI :
10.1109/SEMI-THERM.2015.7100134