Convolutional Neural Networks (CNNs) can be compute-intense models that strain the capabilities of embedded devices executing them. Hardware accelerators support edge-computing environments by providing specialized resources to increase computation efficiency. Nevertheless, they usually reduce flexibility, either providing a limited set of operations or by supporting integer operands of specific bitwidth only. Therefore, an HW-SW co-design strategy is key in this context to synergically combine CNN optimizations with the underlying HW modules. Pruning and quantization are algorithmic-level transformations that effectively reduce memory requirements and computing complexity, potentially affecting CNN accuracies. As a consequence they need to be carefully employed, directly controlling accuracy degradation during the optimization phase, and by taking into account the HW characteristics to effectively leverage them to improve efficiency.
|Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference
|Rios, Marco; Ponzina, Flavio; Levisse, Alexandre Sébastien Julien; Ansaloni, Giovanni; Atienza Alonso, David
|IEEE Transactions on Emerging Topics in Computing
|Overflow-free compute memories for edge AI acceleration
|Ponzina, Flavio; Rios, Marco Antonio; Levisse, Alexandre Sébastien Julien; Ansaloni, Giovanni; Atienza Alonso, David
|ACM Transactions on Embedded Computing Systems (TECS)