Stream Processing Systems ‒ DIAS ‐ EPFL

Streaming data are typically generated by thousands of data sources, which send in records simultaneously at high rates. Streaming data cover a wide range from e-commerce purchases and in-game player activity to information from social networks and financial trading floors. These data need to be processed sequentially and incrementally on a record-by-record basis or over time windows to provide useful insights. More importantly, streaming systems need to run continuously and remain performant while the workload, the input rates, and, even, the underlying hardware change at runtime.

We study streaming engines and are particularly interested in:

– Designing self-tuned and self-repairing systems through algorithms that adapt to the data distribution, the input rate, and the available hardware.

– Design systems that utilize modern hardware (e.g., RDMA) to improve performance.

– Develop techniques that transparently combine and co-optimize batch and streaming components.

(source of the image: https://dbconvert.com/blog/data-stream-processing/)

Warning

Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.

Dalton: Learned Partitioning for Distributed Data Streams

E. Zapridou; I. Mytilinis; A. Ailamaki

2022. International Conference on Very Large Databases (VLDB 2022), Sydney, Australia, September 5-9, 2022. p. 491-504. DOI : 10.14778/3570690.3570699.

Detailed record

Full text – View at publisher