Smart data center energy monitoring: a thermal-aware design approach to ‘Green IT’

Contact persons at EPFL: Dr Martino Ruggiero ([email protected]), Prof. David Atienza ([email protected]).

Contact persons at Credit Suisse: Mr. Marcel Ledergerber ([email protected]), Mr. Hans Martin Graf ([email protected]), Mr. Peter Elste ([email protected]).

Introduction
Power management in data center is an area of increasing interest from several viewpoints as it is backed up by real concerns on energy usage and cost by modern computing systems. Data center computing applications and platforms have been typically designed without regard to power consumption. With increased awareness of energy cost, power consumption tracking and management is now an issue even for compute-intensive server clusters.

Motivations, purpose and ultimate goals of the project
Innovative decision support systems for data center management are needed. ESL and Credit Suisse are building a tool, called PMSM (i.e., Power Monitor System and Management), to log and monitor the power consumption of the machines in data centers. Building and administration of data centers are evolving in increasingly complex scenarios. IT infrastructure managers have to optimize the data center utilization and costs, under several constraints generated by heterogeneous and diverging technical aspects: customer requirements, infrastructure costs, energy costs, physical space available, etc.

Credit Suisse wishes to log and monitor the power consumption of the machines in its data centers. The main goal is to bring the utilization of the data center infrastructure space to a much higher level. The ultimate goal of this project is to develop an innovative instrument able to facilitate the system administration and optimize the data center utilization. This will allow the introduction of several technical and managing advances, such as:

  • Higher utilization of the rack space with servers;
  • Alerts in case of a circuit breaker failure in the rack;
  • Alerts in case of power overload in a rack;
  • Reports in case of asymmetrical load in a dual power environment;
  • Historical power consumption logging;

The same toolset will enable further administration features, such as:

  • Alerts in case of exceeding the maximum rack capacity (kW / A);
  • Alerts in case of exceeding the maximum single power feed to rack (kW/ A);
  • Alerts in case of sudden drop or unusual behavior of single phase/single power connection;
  • Alerts in case of loss of connectivity to power sensors or to DAC.

 ThermalModel-DataCenter-2011-Part2ThermalModelRacksl-DataCenter-2011-v4.png

References

K. Haghshenas, A. Pahlevan, M. Zapater, S. Mohammadi, D. Atienza
MAGNETIC: Multi-Agent Machine Learning-Based Approach for Energy Efficient Dynamic Consolidation in Data Centers, IEEE TSC, July 2019.

L. Costero, A. Iranfar, M. Zapater, et al.
MAMUT: Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-User Video Transcoding, Proc. of DATE, March 2019.

A. Iranfar, M. Zapater, D. Atienza
Machine Learning-Based Quality-Aware Power and Thermal Management of Multistream HEVC Encoding on Multicore Servers, IEEE TPDS, October,

A. Pahlevan, X. Qu, M. Zapater, D. Atienza
Integrating Heuristic and Machine-Learning Methods for Efficient Virtual Machine Allocation in Data Centers, IEEE T-CAD, August 2018.

K. Kanoun, C. Tekin, et al.
Big-Data Streaming Applications Scheduling Based on Staged Multi-Armed Bandits, IEEE TC, Dec. 2016.