Neural Network Quantization and Pruning

Sources of Funding

Compusapien
WiPLASH H2020
Fvllmonti
SNF ML-edge
ACCESS


Convolutional Neural Networks (CNNs) can be compute-intense models that strain the capabilities of embedded devices executing them. Hardware accelerators support edge-computing environments by providing specialized resources to increase computation efficiency. Nevertheless, they usually reduce flexibility, either providing a limited set of operations or by supporting integer operands of specific bitwidth only. Therefore, an HW-SW co-design strategy is key in this context to synergically combine CNN optimizations with the underlying HW modules. Pruning and quantization are algorithmic-level transformations that effectively reduce memory requirements and computing complexity, potentially affecting CNN accuracies. As a consequence they need to be carefully employed, directly controlling accuracy degradation during the optimization phase, and by taking into account the HW characteristics to effectively leverage them to improve efficiency.



Related Publications

Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference
Rios, Marco; Ponzina, Flavio; Levisse, Alexandre Sébastien Julien; Ansaloni, Giovanni; Atienza Alonso, David
2023IEEE Transactions on Emerging Topics in ComputingPublication funded by Fvllmonti (Ferroelectric Vertical Low energy Low latency low volume Modules fOr Neural network Transformers In 3D)Publication funded by WiPLASH H2020 (WiPLASH H2020: Architecting More Than Moore – Wireless Plasticity for Heterogeneous Massive Computer Architectures)Publication funded by SNF ML-edge (Enabling Machine-Learning-Based Health Monitoring in Edge Sensors via Architectural Customization: Swiss NSF Research Project (Div. II))
Overflow-free compute memories for edge AI acceleration
Ponzina, Flavio; Rios, Marco Antonio; Levisse, Alexandre Sébastien Julien; Ansaloni, Giovanni; Atienza Alonso, David
2023ACM Transactions on Embedded Computing Systems (TECS)Publication funded by WiPLASH H2020 (WiPLASH H2020: Architecting More Than Moore – Wireless Plasticity for Heterogeneous Massive Computer Architectures)Publication funded by Fvllmonti (Ferroelectric Vertical Low energy Low latency low volume Modules fOr Neural network Transformers In 3D)Publication funded by ACCESS (AI Chip Center for Emerging Smart Systems, sponsored by InnoHK funding, Hong Kong SAR)