Experimental evaluation of distributed stream processing systems

Project Details

Experimental evaluation of distributed stream processing systems

Laboratory : LSIR Semester / Master Completed

Description

The dramatically increasing trend of time-series data is observed since the devices producing time series data have exploded. Huge number of streaming time series is produced by various applications ( e.g. Twitter, sensor networks, stock markets, etc.). Similar to the prevalent distributed data processing paradigm MapReduce for static data, some distributed real-time computation systems (e.g., S4, Storm and Hadoop online) have been intended to process data streams.

The stream is pipelined through a number of processing steps, i.e. operators ( e.g. find average CO2 concentration at city center over the last hour, etc. ). Stream processing is employed for significant tasks: day trading in stock markets, National security surveillance (US) and Network security, etc. The stream processing platform should share the load, so as to better utilize the cluster resources at all times and provide performance bounds to queries.

In this project, we aim to experimentally compare three open-source stream processing platforms: Yahoo S4, Twitter Storm and Hadoop online. Query examples could be window-based averages, window-based joins and all other stream queries. A query to be experimentally tested should contain a number of simple operators that the data should pass through, e.g. select, join, average.

  • Having the motivation for indulging in a research oriented project
  • Familiar with basic query processing and optimization techniques in database area.
  • Programming skills with Java and experience on stream data processing is a plus.

Contacts

In case of any questions, please drop us an email or come to our offices:

Site:
Contact: Tian Guo