Hardware and Software Techniques for Reliability-Aware Design in Embedded Systems and MPSoCs

Contact person:  Prof. David Atienza ([email protected])

Partners

Computer Architecture and Automation Department (DACYA), UCM, Spain;

Electrical Engineering and Computer Science (DEIS), Bologna Univ., Italy;

Presentation

The relentless scaling of technology and increase in transistor densities are a primary reason for complex embedded systems and Multi-Processor System-on-Chips (MPSoCs) to have become possible. However, power requirements have not scaled accordingly, causing power densities to skyrocket and on-chip temperatures to increase at alarming rates. One of the main effects of the thermal increase is the premature aging of the CMOS devices, reducing the mean-time-to-failure (MTTF) metric. Therefore, the need for improved power and thermal management techniques still exist, however, now expanding those techniques to incorporate reliability metrics is crucial as we head into the future.

In the past there has been attempts to improve upon energy and performance at the microarchitectural level and above. Some authors have tried to dynamically minimize energy and improve system performance by exploiting architectural and application-level adaptability. Also, fault-tolerant microarchitectures have been proposed to improve reliability and hard failures. Other works propose the use of redundancy at the architecture level to improve system reliability and processor lifetime. Also, dynamic fault-tolerance management has been studied as a method to enhance system reliability, taking into account energy efficiency, computation performance, and battery lifetime. However, all these previous methods imply performance or area overheads to enhance the embedded systems and microarchitectural reliability against soft- and hard-errors, and can only be included according to the particular reliability requirements of each final system.

Goal

The purpose of this project is to analyze the effect that the compiler, the application, and the hardware have in the aging of CMOS devices in embedded systems and MPSoC. For this purpose, an FPGA-based emulation platform will be used to collect the thermal traces in the target architectures and evaluate the MTTF after the execution of the mathematical model. The analysis performed during the first experimental stage will allow the envisioning, design and implementation of compiler optimizations and hardware customizations that will improve the accelerated aging process of the system. The software transformations will be coded in a commercial compilation framework that provides high flexibility and modularity to implement solutions at different levels of abstraction. On the other hand, the processor-based system that will be emulated in the FPGA board, and customized, will be based on Sparc-like architectures as Leon 3.