Neural Network Quantization and Pruning ‒ ESL ‐ EPFL

Research Line

Team

	Ansaloni Giovanni
	Atienza Alonso David
	Levisse Alexandre Sébastien Julien
	Ponzina Flavio
	Rios Marco Antonio

Sources of Funding

Compusapien
WiPLASH H2020
Fvllmonti
SNF ML-edge
ACCESS

Convolutional Neural Networks (CNNs) can be compute-intense models that strain the capabilities of embedded devices executing them. Hardware accelerators support edge-computing environments by providing specialized resources to increase computation efficiency. Nevertheless, they usually reduce flexibility, either providing a limited set of operations or by supporting integer operands of specific bitwidth only. Therefore, an HW-SW co-design strategy is key in this context to synergically combine CNN optimizations with the underlying HW modules. Pruning and quantization are algorithmic-level transformations that effectively reduce memory requirements and computing complexity, potentially affecting CNN accuracies. As a consequence they need to be carefully employed, directly controlling accuracy degradation during the optimization phase, and by taking into account the HW characteristics to effectively leverage them to improve efficiency.

Related Publications

Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference
Rios, Marco; Ponzina, Flavio; Levisse, Alexandre Sébastien Julien; Ansaloni, Giovanni; Atienza Alonso, David
2023	IEEE Transactions on Emerging Topics in Computing
Overflow-free compute memories for edge AI acceleration
Ponzina, Flavio; Rios, Marco Antonio; Levisse, Alexandre Sébastien Julien; Ansaloni, Giovanni; Atienza Alonso, David
2023	ACM Transactions on Embedded Computing Systems (TECS)