Prediction of Thermal Hazards in a Real Datacenter Room Using Temporal Convolutional Networks

DATE(2021)

引用 4|浏览34
暂无评分
摘要
Datacenters play a vital role in today's society. At large, a datacenter room is a complex controlled environment composed of thousands of computing nodes, which consume kW of power. To dissipate the power, forced air/liquid flow is employed, with a cost of millions of euros per year. Reducing this cost involves using free-cooling and average case design, which can create a cooling shortage and thermal hazards. When a thermal hazard happens, the system administrators and the facility manager must stop the production to avoid IT equipment damage and wear-out. In this paper, we study the thermal hazards signatures on a Tier-0 datacenter room's monitored data during a full year of production. We define a set of rules for detecting the thermal hazards based on the inlet and outlet temperature of all nodes of a room. We then propose a custom Temporal Convolutional Network (TCN) to predict the hazards in advance. The results show that our TCN can predict the thermal hazards with an Fl-score of 0.98 for a randomly sampled test set. When causality is enforced between the training and validation set the F1-score drops to 0.74, demanding for an in-place online re-training of the network, which motivates further research in this context.
更多
查看译文
关键词
HPC,Thermal Hazard,Predictive Model,Thermal Anomaly Detection,Temporal Convolutional Network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要