Ongoing Student Projects

The following projects are currently pursued by students in our lab, and are therefore not available anymore. They are published for reference and inspiration.

Car Sharing App

To share or not to share? What is the cost of privacy?

Smallworld: Gossiping with Raspberry Pis Locally and Globally

Dynamic Graph-Based Recommender System

Car Sharing App

Contact and Supervisor: Rafael Pires.

In the last decade, many successful tech companies saw a tremendous growth in users of platforms that connect service providers and consumers. Services include ride-hailing, food delivery, lodging rental and dating (well, the notion of provider and consumer might be blurry here, but the principle is the same: connecting users with matching interests).

Matching is one thing that publish/subscribe (pub/sub) systems do well. In a nutshell, users tell what are their interests, which are stored somewhere as subscriptions for future matching. Independently, other users may publish some content that possibly matches a subset of the subscriptions. Finally, the corresponding subscribers receive a copy of the matching publication. The nice thing is that publish/subscribe systems do not necessarily have a centralized server, therefore obviating the need of having a single company acting as intermediary.

Many people use their 5-places car for the daily commute of one single person, which is not environmental-friendly (more traffic jams, not energy efficient). Considering this, we propose to build a platform where the EPFL community can take advantage of a decentralized pub/sub system for optimizing the carpooling when commuting to/from EPFL campi.

The project consists in designing and implementing a mobile application (Android and/or iOS) to connect service providers (drivers) and consumers (riders) through a pub/sub middleware (e.g., https://kafka.apache.org/). Drivers will announce their route, departure time and possibly more metadata (trip frequency, price, available seats etc.), whereas riders mention the routes in which they are interested, possibly along with more filters (price bounds, calendar or time restrictions). Both receive notifications once there is a match. Finally, a reciprocal confirmation step seals the deal.

To share or not to share? What is the cost of privacy?

Contact and Supervisor: Rafael Pires.
Federated Learning has gained traction as a privacy-preserving distributed machine learning architecture due to its absence of raw data movement. This way, central entities that combine user data into global machine learning models only have access to processed data, whereas users locally guard their (possibly sensitive) raw private information.
Saving on data movement is also beneficial in terms of transfer time and energy. However, these savings can only happen if the volume of exchanged processed data is lower than the raw data itself, which raises the following questions:
  • When does that happen? In other words, what makes the amount of processed data become smaller or bigger than the raw data on top of which ML models are built?
  • What is the influence of the nature of data (item ratings, e-mails, pictures), problem (recommendation, classification), ML approach (DNN, matrix factorisation, logistic regression) on the proportion of processed data volume versus raw?
  • If we were in a perfect world with no malicious adversaries, would we benefit from raw data sharing in distributed and decentralised machine learning systems? How?
The goal of this project is to (partially) answer these questions through an empirical approach, i.e., by designing, implementing, measuring and evaluating a set of different prototypes and comparing the outcomes.

Smallworld: Gossiping with Raspberry Pis Locally and Globally

Contact and Supervisor: Erick Lavoie .

Digital connectivity is limited or too expensive in many usage scenarios. In situations of humanitarian crises, Internet connectivity may not be available to displaced populations in need of staying in touch with their family. In low- and middle-income countries, mobile or satellite data plans might be unaffordable to many. In high-income countries, direct connectivity between user devices, such as smartphones and computers, is hampered in multiple ways: iOS and Android smartphones implement incompatible protocols, network security policies in large organizations block access to ports of other local devices, routers implement Network Address Translation that hides potential peers across the Internet.

The goal of this project is to make digital communication simple and affordable by providing peer-to-peer data replication using gossip algorithms (ex: [3]) executing on Raspberry Pis (RPi). This will enable local-first collaborative applications [1,2] and peer-to-peer protocols [4,5,6] to work both locally and globally, effectively providing small-world connectivity to communities. The project consists in implementing and evaluating the following scenarios:

1. Local (Internet Optional)

Users meet or are in close proximity (few meters) and bring their devices close together. Updates happen between:

  1.  Device (Smartphone, Laptop, etc.) and RPi, over a RPi Wifi hotspot, RPi Wifi Direct, third-party Wifi hotspot, or USB/Ethernet
  2.  Two RPis, over Wifi Direct

2. Global (with Internet Connectivity)

Users in different regions connect their RPi to the Internet over Wifi. Updates happen between:

  1. RPi and Device/RPi through an overlay bridge (ex: SSB Room, Hyperswarm)

Each of these scenario will be tested with applications based on peer-to-peer protocols, such as Secure-Scuttlebutt, Dat/Hypercore, or others following the student interest. Once connectivity has been demonstrated, we will design simple user interactions, potentially using hardware extensions, that make transitions between the different modes seamless. For bachelor students, the project can conclude with a report explaining the design choices, replication steps, and experiment results to encourage adoption by many open source projects. Students will be encouraged, but not required, to contribute their results as an extension of the PeachCloud project.

For Master students, or Bachelor students that would achieve the previous goals quickly, the project can additionally contribute more detailed performance experiments and new optimization techniques to reduce storage and bandwidth usage. It could also include the design, implementation, and test of new broadcast protocols [7], that would be designed to work over short range (10s m) with Wifi UDP Multicast (Proof-of-Concept), medium ranges (1-2 km) with Meshtastic (LoRa), and long ranges (100-1000s km) with Digital Modes over Amateur Radio (ex: Pi Build, Tutorial). It could also involve other application scenarios, such as decentralized stochastic optimization [8] for machine learning. Interesting results will be submitted as research publications.

Research Papers and References

[1] Peter van Hardenberg and Martin Kleppmann: “PushPin: Towards Production-Quality Peer-to-Peer Collaboration”. 7th Workshop on Principles and Practice of Consistency for Distributed Data (PaPoC), April 2020. doi:10.1145/3380787.3393683

[2] Martin Kleppmann, Adam Wiggins, Peter van Hardenberg, and Mark McGranaghan: “Local-first software: You own your data, in spite of the cloud”. ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward! ’19), October 2019. doi:10.1145/3359591.3359737

[3] Anne-Marie Kermarrec, Erick Lavoie, Christian Tschudin: “Gossiping with Append-Only Logs in Secure-Scuttlebutt”. Distributed Infrastructure for the Common Good. Delft, The Netherlands. December, 2020. Pre-print

[4] Dominic Tarr, Erick Lavoie, Aljoscha Meyer, Christian Tschudin: “Secure Scuttlebutt: An Identity-Centric Protocol for Subjective and Decentralized Applications”. Information-Centric Networking (ICN). Macao, China. September, 2019. doi:10.1145/3357150.3357396

[5] Maxwell Ogden, Karissa McKelvey, and Mathias Buus Madsen: “Dat – Distributed Dataset Synchronization And Versioning”. 2017. doi:10.31219/osf.io/nsv2c

[6] Paul Frazee, Andre Wosh, and Mathias Buus Madsen: “Hypercore Protocol”. 2020. https://hypercore-protocol.org/

[7] Christian Tschudin: “A Broadcast-Only Communication Model Based on Replicated Append-Only Logs”. SIGCOMM Comput. Commun. Rev. 49, 2 (April 2019), 37–43. doi:10.1145/3336937.3336943

[8] Koloskova, Anastasia, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, and Sebastian U. Stich. “A Unified Theory of Decentralized SGD with Changing Topology and Local Updates.” (2020). arXiv preprint arXiv:2003.10422.

Dynamic Graph-based Recommender System

Contact and Supervisor: Othmane Safsafi.

Recommender systems are key in the success of  today’s tech industry. The goal of a recommender system is to rank items for a specific user in order to provide him with the best item suited for his needs. A classical example of this is the Netflix recommender system. It provides to each user a list of movies which he might like. Such a list must be personalized, accurate, quick to compute and cost effective.

In many settings, such as news recommender for instance, it is of the utmost importance for a recommender to be scalable (able to cope with huge quickly adapt to changes) and reactive (able to cope with frequent addition of data and users). The goal of this project is to assess the capabilities of recommender to cope with such a high-level of dynamicity in the input data as new data are added on a small regular time frame. This project will involve comparing several approaches along these capabilities, starting with graph-based approaches.