Student Projects

Master/Bachelor

The following projects are performed in close collaboration with an experienced member of the lab.

Summary

Better Library APIs/Languages for Distributed ML

Scalable and Fast Visualizations of the Behaviour of Decentralized ML Algorithms

Car Sharing App

Trust vs Obfuscate: Comparing the Scalability Potential of Privacy-Preserving Summary Aggregators

Better Library APIs/Languages for Distributed ML

Some machine learning frameworks, such as PyTorch, have distributed libraries that assume a deterministic communication model in which the recipient of a message must specify the sender and the correct datatype beforehand to receive a message. This makes it cumbersome to implement algorithms in which the topology varies over time. One potential alternative could be to expose asynchronous message queues in which the message carries both the sender and the datatype, implemented using ZeroMQ or NNG.

Machine learning frameworks also make the specification of distributed algorithms quite verbose compared to their mathematical formulation. The specification of the algorithms could be made separately from the system optimizations implemented to make them efficient. One potential approach would be to make the distributed communication transparent, by using, for example, Oz-like dataflow variables as part of streams representing the discrete changes of variables.

In both cases, the projects consist in producing a working implementation, on relevant example algorithms, with Design/API/Tutorial documentations, as well as some performance/scaling experiments. 

Scalable and Fast Visualizations of the Behaviour of Decentralized ML Algorithms

The implementation of decentralized machine learning algorithms adds a spatial dimension where the performance of individual nodes varies. Good interactive visualizations, implemented with libraries such as D3.js, can help quickly identify local behaviour in specific examples that could be missed by theoretical approaches or statistical analysis of execution traces. The goal of this project is to design and implement tools to visualize the behaviour of dynamic large scale networks . The tools therefore need to support a large number of nodes and edges, filtering of information, multiple concurrent visualizations of the same data, and ideally interactive modification.

The project consists in designing and implementing one or multiple new visualizations, or improving the scalability of existing ones, with documentation explaining the design and how to extend/improve it.

Car Sharing App

In the last decade, many successful tech companies saw a tremendous growth in users of platforms that connect service providers and consumers. Services include ride-hailing, food delivery, lodging rental and dating (well, the notion of provider and consumer might be blurry here, but the principle is the same: connecting users with matching interests).

Matching is one thing that publish/subscribe (pub/sub) systems do well. In a nutshell, users tell what are their interests, which are stored somewhere as subscriptions for future matching. Independently, other users may publish some content that possibly matches a subset of the subscriptions. Finally, the corresponding subscribers receive a copy of the matching publication. The nice thing is that publish/subscribe systems do not necessarily have a centralized server, therefore obviating the need of having a single company acting as intermediary.

Many people use their 5-places car for the daily commute of one single person, which is not environmental-friendly (more traffic jams, not energy efficient). Considering this, we propose to build a platform where the EPFL community can take advantage of a decentralized pub/sub system for optimizing the carpooling when commuting to/from EPFL campi.

The project consists in designing and implementing a mobile application (Android and/or iOS) to connect service providers (drivers) and consumers (riders) through a pub/sub middleware (e.g., https://kafka.apache.org/). Drivers will announce their route, departure time and possibly more metadata (trip frequency, price, available seats etc.), whereas riders mention the routes in which they are interested, possibly along with more filters (price bounds, calendar or time restrictions). Both receive notifications once there is a match. Finally, a reciprocal confirmation step seals the deal.

Trust vs Obfuscate: Comparing the Scalability Potential of Privacy-Preserving Summary Aggregators

As sensing devices become widespread, privacy concerns also increase. Not knowing who can get access to private user data and with which intent can negatively seal the fate of computer systems in terms of adoption and adherence to laws.

Private and secure distributed computing is either done by data manipulation that prevents individual identification or by relying on trusted entities to perform computations. Trusted entities can be one’s own device, data centers or other peers. Those, in turn, may employ cryptographic primitives that mathematically ensure data integrity and confidentiality assuming bug-free implementations and deployments.

Federated Learning is a model that relies on no central entity that stores raw user data. Each edge node locally updates a shared model based on its own private data and produces a summary of the changes that these data induce to the model. A collection of such summaries are then aggregated to produce the next version of the shared model.

However effective in eliminating the centralized aggregator, the summaries may reveal users’ personal traits or preferences. To prevent that, this project aims at evaluating cryptographic approaches that would guarantee that summaries aggregation happen in a secure and private environment.

Trusted execution environments (such as Intel SGX) provide cryptographic primitives built in hardware that provide secure enclaves for processing data, but are limited in terms of memory usage. Homomorphic encryption solutions, on the other hand, perform computation on ciphertexts and produce encrypted results, but they involve computationally complex operations.
The goal of this project is to evaluate these approaches, considering their distinct threat models and practical limitations.

The project consists in designing and implementing a federated machine learning summary aggregator both with Intel SGX using its software development kit and another with homomorphic encryption libraries (e.g., https://github.com/shaih/HElib) and evaluate how they perform in terms of scalability.