Research themes and interests
Compilers meet databases. We work on applying ideas from programming languages and compilers to build better data management systems. Projects/systems: DBToaster, SC, DBLab, Legorithmics, Squid and dbStage.
Analytics, machine learning, and managing uncertainty. Projects: LINVIEW, DBLab, DBToaster, MayBMS.
Scalable and massively parallel query processing. Like everyone else, we are interested in scalable processing of big data. Our particular angle is dictated by particular strengths and interests listed above. Projects: Squall, LINVIEW, Openplum, DBToaster.
- Squid (2015–). type-safe metaprogramming/compiler framework for Scala. See this page.
- dbStage (2017–). Language-integrated, modular database system using Squid. See this page.
- DBToaster: aggressive compilation for incremental data processing (2009–). In the DBToaster project, we develop aggressive compilation techniques for database query processing. Our techniques are based on highly efficient incremental query evaluation techniques. We also work on generating massively parallel data management systems based on lightweight components for data analysis in the cloud. Acknowledgments: This is joint work with our alumni at Johns Hopkins University and the University at Buffalo. See http://www.dbtoaster.org.
- LINVIEW: automatically incremental iterated linear algebra (2013 –). go here.
- DBLab: building the fastest possible database engines in high-level languages (2013 –). go here. DBLab makes use of the SC DSL compiler framework which we develop in the lab, see https://github.com/epfldata/sc-public.
- Legorithmics: synthesis of software systems (2011–). In this project, we automatically synthesize components of data management systems, such as efficient out-of-core algorithms and concurrency-control algorithms. (Project homepage)
- Youtopia: optimizing coordination and synchronization in distributed (data management) systems (2009–). In the Youtopia Project, we design declarative, easy-to-use languages for specifying and solving coordination problems as they increasingly occur in social Web applications. We create systems that support the generalization of database transactions to selectively give up isolation to allow for coordination among database applications and users.
- Squall: an online, massively parallel query engine (2010–). Squall is based on Apache Storm and has been open-sourced. See https://github.com/epfldata/squall.
- ALGILE: Algebraic techniques for agile data management systems (2012–, ERC project 279804). This is an umbrella project that subsumes DBToaster and Legorithmics, but goes beyond this. Several other subprojects are currently in an incubation stage.
- MARVEL: computational materials science (2014–). MARVEL is a Swiss National Research Center on computational materials science. We work on using DSL compiler techniques for optimizing high-performance computing code and on advanced analytics for materials science.
MayBMS (2005-2011). We studied the management and processing of uncertain and probabilistic data and developed the MayBMS probabilistic database management system. This system extends the PostgreSQL codebase and is available open source. Go to the project website, which contains both our publications and the code.
Openplum (2012/13) is a scalable parallel database system based on PostgreSQL that is inspired by the design of the Greenplum database system. This started as a course project in our Advanced Databases course and was expanded and open-sourced. Get it from GitHub.