Systolic Array Accelerators for Transformers

We propose a solution to the challenge of implementing transformer models on resource-constrained platforms due to their computational complexity and a large number of parameters. Our solution involves introducing tightly-coupled, small-scale systolic arrays (TiC-SATs) governed by dedicated ISA extensions to accelerate execution. We also employ software optimizations to maximize data reuse and lower miss rates across cache hierarchies. Our TiC-SAT framework is available as open-source.

Systolic Array, Tightly-coupled Accelerators, Transformers


  Amirshahi Alireza
  Ansaloni Giovanni
  Atienza Alonso David
  Klein Joshua Alexander Harrison

Research Partners

Logitech Europe SA Logitech Europe

Sources of Funding


Our project aims to address the computational challenge posed by the massive size and large number of parameters of typical transformer implementations in artificial intelligence (AI) scenarios. Transformers, originally developed for natural language processing (NLP) tasks, are now widely used for various applications such as question answering, sentiment analysis, image classification, clinical note analysis, and speech-to-text generation.

To accelerate the inference of transformer models, we propose a novel strategy called TiC-SAT (Tightly-Coupled Systolic Array Accelerators for Transformers). TiC-SATs are integrated into CPUs as custom functional units governed by dedicated instructions, avoiding the need for dedicated scratchpad memories and reducing resource consumption. Moreover, TiC-SATs leverage software optimizations that increase data locality, taking advantage of available resources in cache hierarchies without disrupting locality when transitioning from accelerated to non-accelerated computation segments.

To validate our strategy, we implement TiC-SAT as a parametric module in the gem5-X full system simulation environment and conduct comprehensive explorations across various SA sizes and benchmark applications. Our contributions include showcasing how SA accelerators can be integrated into computing systems, enabling full-system and application-wide explorations, and highlighting how tightly-coupled lightweight SAs, such as TiC-SATs, can aptly exploit software optimizations to improve data locality and performance. We also assess the benefits of small-scale, tightly-coupled SAs for accelerating inference in transformer models, considering different TiC-SAT sizes and benchmark applications.

Find us on github:

Related Publications

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures
Amirshahi, Alireza; Ansaloni, Giovanni; Atienza Alonso, David
2024-01-18Conference PaperPublication funded by Fvllmonti ((FETPROACT))Publication funded by ACCESS ()
TiC-SAT: Tightly-coupled Systolic Accelerator for Transformers
Amirshahi, Alireza; Klein, Joshua Alexander Harrison; Ansaloni, Giovanni; Atienza Alonso, David
2023-01-16Conference PaperPublication funded by WiPLASH H2020 (New on-chip wireless communication plane)Publication funded by Fvllmonti ((FETPROACT))