Stream Processing Systems

 

Streaming data are typically generated by thousands of data sources, which send in records simultaneously at high rates. Streaming data cover a wide range from e-commerce purchases and in-game player activity to information from social networks and financial trading floors. These data need to be processed sequentially and incrementally on a record-by-record basis or over time windows to provide useful insights. More importantly, streaming systems need to run continuously and remain performant while the workload, the input rates, and, even, the underlying hardware change at runtime.

 

We study streaming engines and are particularly interested in:

– Designing self-tuned and self-repairing systems through algorithms that adapt to the data distribution, the input rate, and the available hardware.

– Design systems that utilize modern hardware (e.g., RDMA) to improve performance.

– Develop techniques that transparently combine and co-optimize batch and streaming components.

(source of the image: https://dbconvert.com/blog/data-stream-processing/)

Dalton: Learned Partitioning for Distributed Data Streams

E. Zapridou; I. Mytilinis; A. Ailamaki 

2022. International Conference on Very Large Databases (VLDB 2022), Sydney, Australia, September 5-9, 2022. p. 491-504. DOI : 10.14778/3570690.3570699.