in-SRAM computing


Keywords
In-memory computing, near-memory computing.
 

Team

  Ansaloni Giovanni
  Choné Clément Renaud Jean
  Eggermann Grégoire Axel
  Rios Marco Antonio


Machine learning algorithms like Deep Neural Networks have dramatically improved tasks such as speech/visual recognition, object detection and many other tasks. Although the network training phase requires very powerful Graphics Processing Units (GPU), the inference process can be performed with a much smaller energy budget and computation complexity, making edge devices attractive to deploy such workloads. In fact, 90% of the DNN inference operations are convolutions, which requires word-level multiplications. Thanks to the regularity of their data-access and instructions, these operations benefit from fine-grain parallelism.
 
However, in conventional Von-Neumann computer architectures, performances of such data-intensive applications are limited due to the disparity of speed between the memory access and the processing units. Additionally, memory access uses orders of magnitude more energy than a bit-wise multiplication. Therefore, to break the memory wall, provide huge parallelism, and reduce energy consumption, In-Memory Computing (IMC) architectures are considered as one of the ultimate solutions.
 
In-SRAM computing is particularly interesting, since it enables seamless integration of SRAM-based IMC architectures with the most advanced standard architectures. Moreover, they are closer to commercial availability thanks to their maturity. In-SRAM can be performed trough bit-line computing: cycle by activating multiple word-lines in an SRAM memory array. This not trivial operation results in bitwise logic operations (AND and NOR) between the accessed words, directly on the bit-lines. It enables SIMD operations on the memory hierarchy while maintaining the inherent geometry of cache memory or other SRAM arrays.

In ESL, we have proposed several architectures based on bit-line computing and several methods of co-design to extract the most in terms of efficiency of these architectures.

Videos

Bit-line computing between two bits
Bit-line computing between two bits



Related Publications

Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference
Rios, Marco; Ponzina, Flavio; Levisse, Alexandre Sébastien Julien; Ansaloni, Giovanni; Atienza Alonso, David
2023IEEE Transactions on Emerging Topics in ComputingPublication funded by Fvllmonti ((FETPROACT))Publication funded by WiPLASH H2020 (New on-chip wireless communication plane)Publication funded by SNF ML-edge (ML-edge: Enabling Machine-Learning-Based Health Monitoring in Edge Sensors via Architectural Customization)
Overflow-free compute memories for edge AI acceleration
Ponzina, Flavio; Rios, Marco Antonio; Levisse, Alexandre Sébastien Julien; Ansaloni, Giovanni; Atienza Alonso, David
2023ACM Transactions on Embedded Computing Systems (TECS)Publication funded by WiPLASH H2020 (New on-chip wireless communication plane)Publication funded by Fvllmonti ((FETPROACT))Publication funded by ACCESS ()
A hardware/software co-design vision for deep learning at the edge
Ponzina, Flavio; Machetti, Simone; Rios, Marco Antonio; Denkinger, Benoît Walter; Levisse, Alexandre Sébastien Julien; Ansaloni, Giovanni; Peon Quiros, Miguel; Atienza Alonso, David
2022IEEE Micro - Special Issue on Artificial Intelligence at the EdgePublication funded by SNF ML-edge (ML-edge: Enabling Machine-Learning-Based Health Monitoring in Edge Sensors via Architectural Customization)Publication funded by Compusapien (Next-gen computing systems inspired by the human brain)Publication funded by WiPLASH H2020 (New on-chip wireless communication plane)
Running Efficiently CNNs on the Edge Thanks to Hybrid SRAM-RRAM In-Memory Computing
Rios, Marco Antonio; Ponzina, Flavio; Ansaloni, Giovanni; Levisse, Alexandre Sébastien Julien; Atienza Alonso, David
2021-02-01 DATE 2021 Design, Automation and Test in Europe Conference, Virtual Conference and Exhibition, February 1-5, 2021Publication funded by SNF ML-edge (ML-edge: Enabling Machine-Learning-Based Health Monitoring in Edge Sensors via Architectural Customization)Publication funded by WiPLASH H2020 (New on-chip wireless communication plane)Publication funded by Compusapien (Next-gen computing systems inspired by the human brain)
BLADE: An in-Cache Computing Architecture for Edge Devices
William Simon, Yasir Qureshi, Marco Rios, Alexandre Levisse, Marina Zapater, David Atienza
2020Transactions on Computers 2020Publication funded by RECIPE H2020 (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems)Publication funded by WiPLASH H2020 (New on-chip wireless communication plane)Publication funded by Compusapien (Next-gen computing systems inspired by the human brain)
An Associativity-Agnostic in-Cache Computing Architecture Optimized for Multiplication
Marco Rios, William Simon, Alexandre Levisse, Marina Zapater, David Atienza
2019-10-06IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC) 2019Publication funded by Compusapien (Next-gen computing systems inspired by the human brain)
A fast, reliable and wide-voltage-range in-memory computing architecture
William Simon, Juan Galicia, Alexandre Levisse, Marina Zapater, David Atienza
2019-06-02IEEE/ACM Design Automation Conference (DAC) 2019Publication funded by Compusapien (Next-gen computing systems inspired by the human brain)
BLADE: A BitLine Accelerator for Devices on the Edge
W. A. Simon; Y. M. Qureshi; A. Levisse; M. Zapater Sancho; D. Atienza Alonso
2019-05-09ACM Great Lakes Symposium on VLSI (GLSVLSI 2019)Publication funded by Compusapien (Next-gen computing systems inspired by the human brain)Publication funded by RECIPE H2020 (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems)