Available Student Projects ‒ LAP ‐ EPFL

Accelerate FPGA Implementation Time

Supervisor: Andrea Guerrieri ([email protected])

FPGAs are powerful devices from embedded applications to data centres. However, developing with FPGAs requires hardware design skills and a long implementation time. High-level synthesis covers the gap in creating hardware design from C/C++ code. However, the time required to generate a bitstream configuration to download to the FPGA is still very long. This project aims to accelerate the FPGA implementation design from several minutes to a few seconds.

Postquantum Cryptography Core Using High-Level Synthesis

Supervisor: Andrea Guerrieri ([email protected])

In 2016, the American National Institute of Standards and Technology (NIST) started a contest to qualify a set of post-quantum cryptography standards. High-level synthesis produces HDL code for FPGAs out of C/C++ in an automatic way, bridging the gap from algorithm to hardware design. However, the quality of results could be suboptimal compared to RTL-based (Register Transfer Level) designs. This project aims to explore and improve the QoR of the post-quantum cryptography hardware using HLS.

Reconfigurable SoCs with Quantum Optics for Astronomical and Aerospace Applications.

Supervisor: Andrea Guerrieri ([email protected])

Reconfigurable System-on-Chips (SoCs) are incredibly versatile and flexible electronic devices, offering the capability to fulfil real-time constraints across various domains, such as astronomy, and aerospace among many others. This project focuses on harnessing the potential of reconfigurable SoCs to establish interfaces and control quantum sensors. To achieve this goal, advanced design techniques, including High-level Synthesis (HLS) and the Chisel Hardware Description Language (HDL), will be applied, ensuring the most cutting-edge and efficient methods are employed in the development of this project.

Identifying Where Out-of-Order Execution is Useful in Tagged Dataflow Circuits

Supervisor: Ayatallah Elakhras ([email protected])

Dynamically scheduled high-level synthesis, through dataflow circuit generation, has proven successful at exploiting instruction-level parallelism in several important situations when its statically scheduled counterpart fails. Our recent paper “Survival of the Fastest: Enabling More Out-of-Order Execution in Dataflow Circuits” allows dataflow circuits to have reordering capabilities similar to those of out-of-order superscalar processors. It enables the out-of-order execution of different instances of the same operation by allowing the fine-grained tagging of some parts of a dataflow circuit. Doing so unleashes high levels of parallelism by hiding large latencies resulting from data dependencies, for instance, at a modest area overhead. But the amount of achievable parallelism is dictated by the choice of which operations and/or sub-circuits in a dataflow circuit are designed to execute out of order (i.e., by producing outputs in a different order from that received at their inputs). The goal of this project is to identify such operations and/or sub-circuits to exploit high levels of parallelism in an arbitrary dataflow circuit, and to explore applications that can benefit from this type of out-of-order execution.

An Area-Efficient Memory Interface for Dataflow Circuits

Supervisor: Ayatallah Elakhras ([email protected])

In the recent years, dynamically scheduled HLS through dataflow circuit generation has proven its effectiveness over its statically scheduled counterpart due to its flexibility in dealing with the unpredictability of program control decisions and the variable latency of some operations. Applications with complicated control structures and irregular memory access patterns are a good candidate for it, but when the memory access pattern can not be statically disambiguated, a runtime mechanism to reorder memory accesses is required to maintain the correctness of the sequential program execution by preventing potential memory hazards. Out-of-order processors typically rely on load-store queues (LSQs) for this purpose and so do dataflow circuits. Although LSQs have proven to do a great job at responding to memory accesses as fast as possible while maintaining the correct semantic order, they are a main source of area and power inefficiency especially in the context of dataflow circuits. The goal of this project is to abandon LSQs and to replace them with local circuitry, inserted solely between memory dependent operations, that can be as fast as LSQs but more area-efficient.

Analysis of Memory Accesses of Task-Level Parallel Algorithms

Supervisor: Canberk Sonmez ([email protected])

Task-level parallelism (TLP) is an abstraction that models the program execution in terms of multiple tasks that can be executed in parallel. TLP is proved to be effective for a variety of tasks, most prominently for graph processing algorithms, and we believe that TLP is a great candidate for FPGA acceleration. While prior work explored the computation aspect of TLP both on CPUs and FPGAs, the memory aspect is yet to be investigated. In this project, the student is expected to perform an in-depth analysis of various graph processing algorithms to help design a specialised memory controller for our FPGA-based TLP acceleration framework. This analysis includes, but not limited to, investigating the spatial and temporal locality, and data reuse patterns within/across tasks.

Toward Coarse-Grained Field Programmable Gate Arrays

Supervisor: Louis Coulon ([email protected])

Reconfigurable computing systems are ideal candidates to fulfil the growing need for hardware specialization. They enable the mapping of applications to efficient dedicated digital circuits which can be implemented on generic computing fabrics, thus improving both performance and energy efficiency while still being capable of supporting virtually any application. In practice, the only reconfigurable computing system available commercially today is the FPGA, recently adopted as a computing device in data centres and targeted by all high-level synthesis compilers to ease its programming. However, in data centre setting, there is a striking mismatch between the FPGA programmable fabric and the computing needs of applications: these generally perform word-level computations, while FPGAs allow bit-level reconfigurability and provide bit-level abstractions. In this project, we study a coarser granularity version of FPGAs and evaluate its performance as a word-level reconfigurable computing fabric. We offer opportunities to work on new compiler optimisations targeting this new category of processors, new architectural ideas, and the physical implementation and modelling of such processors.

Open-Source Work on Dynamic High-level Synthesis Compiler Targeting FPGAs

Supervisor: Lucas Ramirez ([email protected])

Dynamic high-level synthesis (DHLS) is the process of turning high-level source code into synchronous dynamically scheduled circuits. We offer opportunities to work on an open-source DHLS compiler called Dynamatic that is based on the MLIR compiler ecosystem and which generates synthesizable RTL that targets FPGA architectures from C/C++ code. Projects can revolve around writing compiler passes that transform our intermediate representation (IR) at some level of our progressive lowering pipeline. These go from close-to-source high-level analysis steps (e.g., memory dependence analysis, loop transformations) down to dataflow circuit-level transformations (e.g., buffer placement, bit-width optimisation). We also have open lines of work in the infrastructure surrounding the core compiler, be it debugging, visualisation, or benchmarking tools. Regardless of the exact line of work, as an open-source project we value good software design, solid development practices, and want to make any student contribution a permanent part of Dynamatic’s codebase. If you would like to discuss potential directions, please reach out with your resume/transcript and any specific interest you may have.

Development of Processing Engines for Task Level Parallelism

Supervisor: Mohamed Shahawy ([email protected])

Task Level Parallelism (TLP) is an approach to parallelise programs through dividing them into a number of independent tasks. TLP is supported in software by several compilers like OpenCilk and libraries like IntelTBB. TLP is beneficial for workloads that express control dependencies and irregular memory access patterns like graph and tree processing. Currently, TLP runs mainly on parallel CPUs; our lab aims to build TLP support for FPGA computing. In this project, the student is expected to implement and verify a set of Processing Elements (PEs) that uses the TLP backbone infrastructure developed at LAP. The PEs would range from simple benchmark PEs to more complex ones that implement modern parallel graph mining algorithms.

Wishbone Bus Architecture Generator

Supervisor: Theo Kluter ([email protected])

The wishbone bus is an open source hardware computer bus that can be used in embedded systems to connect the different components together in a memory mapped way. The wishbone standard allows for a shared-bus, a dataflow, and a crossbar architecture. In this project you are going to design a tool that generates the required HDL (VHDL and/or Verilog) implementing a wishbone architecture given a specification of the master(s) and slave(s) connected to the bus.

RISC-V Educational Simulator

Supervisor: Theo Kluter ([email protected])

The open source RISC-V instruction set architecture (ISA) is becoming more and more important in industry and research. A good reason to use it in our educational program. In this project you are going to implement a RISC-V ISA simulator framework that can be used to introduce students into computer architecture.