Power-aware acceleration of Deep Learning (DL) training and inference on High-Performance Computing (HPC) servers

Nowadays, Deep Convolutional Neural Networks (DCNNs) play a significant role in many application domains, such as, computer vision, medical imaging, and advanced signal processing and classification. Nonetheless, designing a DCNN, able to defeat the state of the art,is a manual, challenging, and time-consuming task, due to theextremely large design space, as a consequence of a large numberof layers and their corresponding hyperparameters.

In this research line, we target to tackle the aforementioned problem of automatic exploration of the design space of DCNNs by developing new techniques to perform hyperparameter optimization of DCNNs through different machine learning based approaches. In particular, we are attempting to completely eliminate the human effort in this hyperparameter optimization process by exploiting different Multi-Agent Reinforcement Learning (MARL)-based approaches. The proposed MARL-based approaches are data-driven and able to consider an arbitrary set of design objectives and constraints for different DCNNs.

So far, our results show that we can adapt Q-learning methods to define learning agents per layer of the DCNN to split the optimization design space into independent smaller design sub-spaces. Then, each each agent can fine-tune the hyperparameters of the assigned layer concerning a global reward. Moreover, in this research line we also explore how to form the Q-tables along with new update rules to agents' communication.

Ou results show so far that the we can successfully apply our MARL-based solutions to different well known DCNNs, including GoogLeNet,VGG, and U-Net, and various datasets for image classification and semantic segmentation. Indeed, our current MARL-based approach reduces the model size, training time, and inference time by up to, respectively, 83x, 52%, and 54% without any degradation inaccuracy.

Our next target in this research line is to extend our MARL-based approach to be competitive for any search methods targeting large DCNNs in terms of the final network accuracy and number of optimized parameters, while significantly reducing the optimization cost in an automated fashion.

Related Publications

Multi-Agent Reinforcement Learning for Hyperparameter Optimization of Deep Convolutional Neural Networks
Iranfar, Arman; Zapater Sancho, Marina; Atienza Alonso, David
2021Journal of IEEE Transactions on Computer-Aided DesignPublication funded by Compusapien (Next-gen computing systems inspired by the human brain)Publication funded by DeepHealth H2020 (Deep-Learning and HPC to Boost Biomedical Applications for Health)Publication funded by Facebook (Unrestricted Research Grant)
ECOGreen: Electricity Cost Optimization for Green Datacenters in Emerging Power Markets
Pahlevan, Ali; Zapater Sancho, Marina; Coskun, Ayse K.; Atienza Alonso, David
2021IEEE Transactions on Sustainable Computing (T-SUSC)Publication funded by Compusapien (Next-gen computing systems inspired by the human brain)Publication funded by RECIPE H2020 (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems)