Student Projects

The following projects are available for Master and Bachelor students. They are performed in close collaboration with an experienced member of the lab. Apply for a project by sending an email to the contact mentioned for the project.

You may also suggest new projects, ideally close enough to our ongoing, or previously completed projects. In that case, you will have to convince Anne-Marie Kermarrec that it is worthwhile, of reasonable scope, and that someone in the lab can mentor you!

Summary

Performance and Economics of k-NN on a Cluster of Raspberry Pi 4

Performance and Economics of k-NN on a Cluster of Raspberry Pi 4

Contact and Supervisor: Erick Lavoie (also CC Anne-Marie Kermarrec).

In the last 15 years, renting cloud infrastructure has often been cheaper and less risky compared to acquiring and managing computing infrastructure, by, among other things, lowering the risk of buying too much computing capacity in anticipation of growth. However, with (1) the slowing of Moore’s law and the convergence of single-core performance between the lower-end devices and servers available in clouds;  (2) the larger economies of scale available when a single low-cost and open platform, such as the Raspberry Pi 4, is adopted in the consumer, Internet-of-Things, and industrial embedded markets at the same time; and (3) the wide availability of open source distributed computing software platforms that can be self-hosted, the most favorable economics seem to be shifting.

Nowadays, compared to acquiring a high-end high-performance server machine, a cluster of Raspberry Pis can offer 2-3x the total computing throughput and twice the amount of RAM for the same price. Compared to renting cloud infrastructure, a cluster of Raspberry Pis can be fully paid in 2-4 years including electricity costs (but not including maintenance). Because individual devices are widely available and low-cost (<100USD) each, the computing capacity can be progressively ramped-up with much lower risks of over-provisioning. Because Moore’s law is slowing, it will take longer until devices acquired today become uncompetitive with newer generations, amortizing their acquisition costs over a longer period. Finally, the absolute performance of Raspberry Pi 4 is comparable to the commodity hardware used by Google in 2005, but the devices are more energy efficient by a factor of 4-5x and probably also less expensive by a factor of at least 4-5x [1] so they can certainly implement at least similar services as those offered then at a lower cost.

However, compared to current Cloud and Cluster environments, Raspberry Pis typically have slower RAM and slower network interfaces, so memory- or communication-intensive applications may suffer significant degradation when being ported from the former to the latter. A Raspberry Pi cluster may therefore not be the best hardware infrastructure for all of today’s distributed applications. The goal of this project is to characterize the performance and economics of a cluster of Raspberry Pi 4 to identify which applications may most benefit from the underlying shift in economics.

As a first candidate distributed application, you will implement personalized recommendations, such as currently done for movies, with the k-Nearest Neighbors algorithm. You will implement the same application to run on the IC Cluster, both on a high-end machine available as Hardware-as-a-Service and in virtualized containers. You will then compare all implementations on processing speed, acquisition and running costs, and energy consumption. For the latter, we will discuss with the IC IT team to see if we can obtain numbers for their infrastructure for a fair comparison.

You should be comfortable implementing an algorithm from a high-level mathematical description and debugging and measuring the performance of distributed applications. If you have successfully completed the CS449: Systems for Data Science course with a good grade, you would be an ideal candidate.

References

[1] Google machine. https://google-services.blogspot.com/2006/07/google-machine.html,  Accessed: 2021-01-21.