Performance Evaluation of Fault-Tolerance Architectures

Contact: Maaz Mashood Mohiuddin

Background:

The primary aim of fault-tolerance architectures is to mask software and hardware faults from affecting the application/service they are tolerating faults in. These faults include crashes (benign) or delays in the application, or arbitrary (Byzantine) faults, all of which lead to undesirable effects.

In the context of real-time applications, besides providing fault-tolerance, respecting real-time requirements of the application such as bounded-delay is of utmost importance. Furthermore, in large-scale deployments, it is also important to have a high-efficiency in terms of replicas, i.e., providing a certain level of fault-tolerance with lower number of replicas.

In this project, we aim to evaluate the delay and efficiency of existing fault-tolerance architectures [1][2] and compare them with Axo [3], a low-delay tolerance architecture designed for real-time applications.

Project Goals:

  • Understand existing fault-tolerance architectures
  • Design of scenarios for evaluation in the light of real-time applications
  • Implementation of the scenarios

Benefits:

Reliability is central to all computer systems. A study of fault-tolerance architectures brings one to understand the various trade-off involved in building such architecture and computer systems in general. Needless to say, it opens up a plethora of opportunities both in industry and academia.

Required Skills:

  • C/C++
  • Performance Evaluation

References:

[1] http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf

[2] http://www.sc.ehu.es/acwlaalm/sdi/replication-schemas.pdf

[3] http://infoscience.epfl.ch/record/217463

Supervisors: Maaz Mashood Mohiuddin, Wajeb Saab