DBLAB Projects

Student Projects

The following is a list of potential projects for semester or master-thesis projects in DBLAB and SC. If you have additional proposals, don’t hesitate to contact us (Amir Shaikhha and Lionel Parreaux).

Data Compression Schemes as Compiler Transformations


Compression algorithms can improve the runtime performance and memory usage of database systems. This improvement is more obvious in the context of column-store database systems [1]. 

On the other hand, with the advent of in-memory database systems, query compilers  are getting more important. As a result, using compilation techniques in database development is gaining more attention [2, 3]. DBLAB [4, 5] is a framework for building database systems using high-level programming, producing optimized code in a low-level language like C. This is achieved through compilation techniques implemented in an extensible optimizing compiler called SC (Systems Compiler) [6].

The suggested project combines these two aspects. To be more concrete, we would like to implement different compression schemes using compilation transformations, which will be encoded using SC and added to DBLAB.




Alternative Backends for DBLAB


DBLAB currently contains two code generators: one targeting Scala, and the other targeting C.

The aim of this project is to add other backends to DBLAB.

One possibility would be to add an LLVM code generator (for example by using Scala Native). Furthermore, one should exploit vectorization possibilities while generating LLVM code.

Another possibility would be to add a Truffle [7] code generator to DBLAB. Truffle [7] is a framework provided by Oracle Labs for building self-optimizing AST interpreters.  




Automatically synthesizing Spark programs


There are different ways to write a query processing algorithm (such as a join algorithm) using Spark. For different data sizes and based on the number of worker nodes, the best Spark implementation may vary. The aim of this project is to use program synthesis techniques [8] to automatically derive the most efficient Spark program based on the given data information and the given platform.




Using Scala.meta for simplifying DSL embedding


Domain-specific languages (DSLs) are languages specialized for a specific domain (such as SQL for database queries and management). Writing the whole compilation pipeline for a given language requires a huge amount of work. An alternative solution is to reuse the compilation pipeline of an existing “host” language and define the new “object” language using the syntax of that host language. This is known as “embedding” the object language into a host language.

There are several frameworks available to help in defining embedded DSLs in Scala. However, for the language developers, the task still requires writing a lot of boilerplate. Yin-Yang suggested to use Scala reflection/macros to automatically generate the needed boilerplate (Section 6 of [9]). The SC framework brought this idea further and automatically generated the optimization passes. The aim of this project is to investigate the facilities provided by Scala.meta [10] to improve the DSL embedding process.




[1] Abadi, Daniel, Samuel Madden, and Miguel Ferreira. “Integrating compression and execution in column-oriented database systems.” Proceedings of the 2006 ACM SIGMOD international conference on Management of data. ACM, 2006.


[2] Klonatos, Yannis, et al. “Building efficient query engines in a high-level language.” Proceedings of the VLDB Endowment 7.10 (2014): 853-864.


[3] Koch, Christoph. “Abstraction without regret in database systems building: a manifesto.” IEEE Data Engineering Bulletin 37.EPFL-ARTICLE-197359 (2014).


[4] Shaikhha, A., Klonatos, I., Parreaux, L. E. V., Brown, L., Dashti Rahmat Abadi, M., & Koch, C. (2016). How to Architect a Query Compiler. In SIGMOD 2016


[5] https://github.com/epfldata/dblab


[6] https://github.com/epfldata/sc-public


[7] One VM to rule them all, http://dl.acm.org/citation.cfm?id=2509581


[8] Automatic synthesis of out-of-core algorithms, http://dl.acm.org/citation.cfm?id=2465334


[9] Yin-Yang: concealing the deep embedding of DSLs, http://dl.acm.org/citation.cfm?id=2658771


[10] https://github.com/scalameta/scalameta