Impressions from HPTS 2015 ‒ DIAS ‐ EPFL

HPTS (high performance transaction systems, http://hpts.ws/) is a series of informal events that brings together a diverse group of database system researchers and practitioners. It started out 30 years ago with the focus on transaction processing systems and evolved to encompass all aspects of large scale systems. The main attractiveness of the workshop is in its small size – less than 100 participants – and the mix of people from industry and academia at all levels of seniority which results in many lively discussions that often stretch long into the night. The event is occurring every odd year in early fall in Asilomar, CA and spans from Sunday afternoon until Wednesday morning.

This year’s event was held from the 27th until the 30th of September. In contrast to previous editions, it didn’t feature any panel discussions, but instead devoted more time to presentations that were of very high quality. Despite the word “transactions” appearing in the title, only one of the long talk sessions was devoted to transactions consisting of discussions around high performance distributed transaction processing, optimizations for the flash storage and testing correctness of emerging distributed transaction implementations for cloud environments. However, transactions were much more popular in the gong show session with many talks discussing a variety of system and application aspects. Despite many concerns that we’re just revisiting the same old problems solved decades ago and that our systems provide enough performance for all human-generated (high value) transactions, participants identified many application areas in need of systems that combine efficient short transactions with other types of data processing including long updates, complex analytics and machine learning. However, semantics of transactions in this context and how to use this information to build efficient systems remain open problems.

An overarching theme of many presentations this year was the inevitability of moving data management systems to the cloud. In this environment, one needs to take security, fault tolerance and elasticity as first class citizens when designing systems. Fine-grained instrumentation and monitoring are essential tools for achieving this and we have heard many talks about different challenges in achieving predictability and ensuring quality of service in distributed systems. Operational aspects, including deployment, configuration and debugging remain challenging, but containers and associated orchestration technology promise to solve many of these issues.

The proliferation of monitoring applications, both in the context of the internet of things and in datacenters, led to renewed interest in streaming applications. In contrast to previous generations of streaming systems, modern systems support wider variety of analytical computations in real time which eliminates the need to ingest the data inside a data analytics system.

The growing size of data stored in a multitude of different systems is emphasizing the need for integration of data from various sources. One issue that has been challenging for a long time is ensuring data quality which still requires manual data processing in many domains. It is generally acknowledged that it is not enough to just dump the data in the Hadoop data lake and expect that systems higher up the stack would be able to efficiently process it. In practice, this often means that they need to convert the data to more suitable format which creates same issues as the traditional data warehouses. Modern systems take a more dynamic approach by keeping data in-situ and either integrating it in middleware querying layer or even generating query processing pipelines just-in-time for maximum efficiency.

Finally, efficiency was one goal that everyone was aiming at, although it had different meaning in various contexts. In particular, we have heard talks about system designs that exploit abundant parallelism and the features of modern processors such as hardware transactional memory, as well as the emerging non-volatile memory. In the distributed system space, efficiency concerns were mostly about resource utilization which inspired designs for better storage layouts, the use of code compilation techniques and fine-grained memory management due to unpredictability of the default garbage collection mechanisms.

Overall, this year’s HPTS was a great event with a lot of opportunities for discussions with other researchers and practitioners from both academia and industry and we’re looking forward to the next workshop in 2017.

by Danica Porobic