Resource scheduling for distributed stream processing system in the cloud

Project Details

Resource scheduling for distributed stream processing system in the cloud

Laboratory : LSIR Semester / Master Completed

Description

We are living in the age of big data, where data is measured in terabytes, streamed in real-time, and derived at unprecedented speeds in diverse forms. A common requirement in many emerging applications is the ability to process, in real time, a continuous high-volume stream of data. Examples of such applications are sensor networks, real-time analysis of financial data, and intrusion detection. These applications are commonly referred to as data stream systems.

In this scenario Storm, an open source distributed realtime computation system, represents a disruptive technology that is quickly gaining the favor of big players like Twitter and Groupon. The stream is pipelined through a number of processing steps, i.e. operators ( e.g. find average CO2 concentration at city center over the last hour, etc. ).

In this project, we focus on the scheduler of Storm, which is the core component in charge of assigning the computation resource of the cloud to the operators for parallel computing. However, the default scheduler of Storm applies naive round-robin manner without consideration about the different communication and computation overhead of the operators, thereby leading to negative effect on the performance. Therefore, in this project we aim to: (1) build a load monitor platform on top of Storm to collect the run-time information (i.e., network traffic, workload etc.) of nodes in a Storm cluster. (2) design an advanced resource scheduling strategy to dynamically improve the communication efficiency and workload distribution.

Prerequisites

  • Having the motivation for indulging in a research oriented project
  • Familiar with basic query processing and optimization techniques in database area.
  • Programming skills with Java and experience on stream data processing is a plus.
  • Experience on stream data processing systems is a plus.

Contacts

In case of any questions, please drop us an email or come to our offices:

Site:
Contact: Tian Guo