The following is a list of potential projects for semester or master-thesis projects in DBLAB and SC. If you have additional proposals, don’t hesitate to contact us (Amir Shaikhha and Lionel Parreaux).
Data Compression Schemes as Compiler Transformations
Compression algorithms can improve the runtime performance and memory usage of database systems. This improvement is more obvious in the context of column-store database systems .
On the other hand, with the advent of in-memory database systems, query compilers are getting more important. As a result, using compilation techniques in database development is gaining more attention [2, 3]. DBLAB [4, 5] is a framework for building database systems using high-level programming, producing optimized code in a low-level language like C. This is achieved through compilation techniques implemented in an extensible optimizing compiler called SC (Systems Compiler) .
The suggested project combines these two aspects. To be more concrete, we would like to implement different compression schemes using compilation transformations, which will be encoded using SC and added to DBLAB.
Alternative Backends for DBLAB
DBLAB currently contains two code generators: one targeting Scala, and the other targeting C.
The aim of this project is to add other backends to DBLAB.
One possibility would be to add an LLVM code generator (for example by using Scala Native). Furthermore, one should exploit vectorization possibilities while generating LLVM code.
Another possibility would be to add a Truffle  code generator to DBLAB. Truffle  is a framework provided by Oracle Labs for building self-optimizing AST interpreters.
Automatically synthesizing Spark programs
There are different ways to write a query processing algorithm (such as a join algorithm) using Spark. For different data sizes and based on the number of worker nodes, the best Spark implementation may vary. The aim of this project is to use program synthesis techniques  to automatically derive the most efficient Spark program based on the given data information and the given platform.
Using Scala.meta for simplifying DSL embedding
Domain-specific languages (DSLs) are languages specialized for a specific domain (such as SQL for database queries and management). Writing the whole compilation pipeline for a given language requires a huge amount of work. An alternative solution is to reuse the compilation pipeline of an existing “host” language and define the new “object” language using the syntax of that host language. This is known as “embedding” the object language into a host language.
There are several frameworks available to help in defining embedded DSLs in Scala. However, for the language developers, the task still requires writing a lot of boilerplate. Yin-Yang suggested to use Scala reflection/macros to automatically generate the needed boilerplate (Section 6 of ). The SC framework brought this idea further and automatically generated the optimization passes. The aim of this project is to investigate the facilities provided by Scala.meta  to improve the DSL embedding process.
 Abadi, Daniel, Samuel Madden, and Miguel Ferreira. “Integrating compression and execution in column-oriented database systems.” Proceedings of the 2006 ACM SIGMOD international conference on Management of data. ACM, 2006.
 Klonatos, Yannis, et al. “Building efficient query engines in a high-level language.” Proceedings of the VLDB Endowment 7.10 (2014): 853-864.
 Koch, Christoph. “Abstraction without regret in database systems building: a manifesto.” IEEE Data Engineering Bulletin 37.EPFL-ARTICLE-197359 (2014).
 Shaikhha, A., Klonatos, I., Parreaux, L. E. V., Brown, L., Dashti Rahmat Abadi, M., & Koch, C. (2016). How to Architect a Query Compiler. In SIGMOD 2016
 One VM to rule them all, http://dl.acm.org/citation.cfm?id=2509581
 Automatic synthesis of out-of-core algorithms, http://dl.acm.org/citation.cfm?id=2465334
 Yin-Yang: concealing the deep embedding of DSLs, http://dl.acm.org/citation.cfm?id=2658771