Interactive Streaming for Multi-View Systems
In new multimedia services, users do not passively download media content, but rather dynamically select the content they are interested in. Resource allocation strategies cannot anymore be built offline, according to predefined users behaviors. Hence, effective real-time interactive services can only be devised if adaptivity to channel conditions and users dynamics represents the primary feature of media delivery strategies.
We consider a live acquisition scenario in which multiple cameras acquire the same scene from different viewpoints and generally produce correlated video streams. This results in large amounts of highly redundant data. In order to save resources, it is critical to handle properly this correlation during encoding and transmission of the multi-view data. In our research, we propose correlation-aware transmission strategies able to find the best scheduling strategy, defined as the one that maximize the quality experienced by users during the scene navigation yet satisfying the channel constraints.
Relevant publications can be found here , for a correlation-aware scheduling optimization, and here, for a joint optimization of coding and scheduling strategies given statistical information of users’ interactivity.
In-Network View Synthesis
Interactive free viewpoint video systems endow users with the ability to choose and display any virtual view of a 3D scene, given original viewpoint images captured by multiple cameras. In particular, a virtual view image can be synthesized by the decoder via depth-image-based rendering (DIBR) using texture and depth images of two neighboring views that act as reference viewpoints. One of the key challenges in interactive multi-view video streaming systems is to transmit an appropriate subset of reference views from a potentially large number of camera-captured views such that the client enjoys high quality and low delay view navigation even in resource-constrained environments.
In our research, we propose a new paradigm to solve the reference view selection problem and capitalize on cloud computing resources to perform fine adaptation close to the clients. The key intuition is that, in resource-constrained networks, re-sampling the viewpoints of the 3D scene in the network —i.e., synthesizing novel virtual views in the cloudlets that are transmitted as new references to the decoder— is beneficial compared to the mere subsampling of the original set of camera views.
Relevant publications can be found here.
Acknoledgements: Ballet image is borrewed by Microsoft webpage http://research.microsoft.com/en-us/downloads/5e4675af-03f4-4b16-b3bc-a85c5bafb21d/