Neural Network Quantization and Pruning

Sources of Funding

Compusapien
WiPLASH H2020
Fvllmonti
SNF ML-edge
ACCESS


Convolutional Neural Networks (CNNs) can be compute-intense models that strain the capabilities of embedded devices executing them. Hardware accelerators support edge-computing environments by providing specialized resources to increase computation efficiency. Nevertheless, they usually reduce flexibility, either providing a limited set of operations or by supporting integer operands of specific bitwidth only. Therefore, an HW-SW co-design strategy is key in this context to synergically combine CNN optimizations with the underlying HW modules. Pruning and quantization are algorithmic-level transformations that effectively reduce memory requirements and computing complexity, potentially affecting CNN accuracies. As a consequence they need to be carefully employed, directly controlling accuracy degradation during the optimization phase, and by taking into account the HW characteristics to effectively leverage them to improve efficiency.



Related Publications

Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference
Rios, Marco; Ponzina, Flavio; Levisse, Alexandre Sébastien Julien; Ansaloni, Giovanni; Atienza Alonso, David
2023IEEE Transactions on Emerging Topics in ComputingPublication funded by Fvllmonti ((FETPROACT))Publication funded by WiPLASH H2020 (New on-chip wireless communication plane)Publication funded by SNF ML-edge (ML-edge: Enabling Machine-Learning-Based Health Monitoring in Edge Sensors via Architectural Customization)
Overflow-free compute memories for edge AI acceleration
Ponzina, Flavio; Rios, Marco Antonio; Levisse, Alexandre Sébastien Julien; Ansaloni, Giovanni; Atienza Alonso, David
2023ACM Transactions on Embedded Computing Systems (TECS)Publication funded by WiPLASH H2020 (New on-chip wireless communication plane)Publication funded by Fvllmonti ((FETPROACT))Publication funded by ACCESS ()