IM2.IP1: Integrated Multimodal Processing
Duration of project
From January 2010 to December 2013
Swiss National Science Foundation
This project will focus on the IM2 core multimodal technologies (speech processing, visual processing, integration of modalities, coordination among modalities, further development and evaluation of meeting browsers) geared towards integration into end-to-end applications and consolidating all IM2 activities developed in Phase I and Phase II. Research focus in IP1 will also be driven by the findings and possible requirements arising from IP2.
IP1 has a research component (pursuing the most promising and/or fundamental research directions initiated in IM2 Phase II), as well as a strong integration and evaluation component. Hence, besides further pursuing some of the most promising research direction in multimodal processing in the strict context of the IM2 vision, one of the objectives of IP1 will be to extend the applications of multimodal technologies, within the human meeting and conference framework, towards more integrated systems that work in real time, with human intervention only when required.
MMSPL team is involved in IM2.IP1, working on:
- Multimodal quality metrics for multimedia content abstraction
Multimedia services rely on the presence of two main actors: the human subject, who is the end user of the service, and the multimedia content, which is the object of the multimedia communication. In this scenario one of the most relevant features, implicitly taken into account by the end user, is the quality of the multimedia data involved in the application of interest. The user interacts with the multimedia data and she/he very easily judges the quality of its content and, widely speaking, the quality of the multimedia experience which she/he is participating. The “quality” is a particular feature of the multimedia content, since it depends upon the peculiarity of the content itself but it is also strictly related to the subjectivity of the human beings who interact with the content. This is the reason why the user-media interaction can be defined as “multimedia experience”.
Our research in this scenario focuses on the subjective quality assessment modeling and mapping into objective algorithms, i.e. metrics. The goal of our study is to design objective metrics which allow automatic evaluation of the quality of multimedia content, highly correlated with the real human perception. In particular, an important part of our research concentrates on the understanding and modeling of the multi-modal perception of quality, i.e. visual, audio and audio-visual quality, in order to design a metric for the assessment of the more general and complex concept of Quality of Experience in a multimedia service.
- Tagged media-aware multimodal content annotation
The approach for multimedia content access based only on content analysis has not delivered widely accepted solutions. User activities in social networks, as tagging, annotating and rating of multimedia content, provide an entirely new view on how to solve the multimedia content access problem.
The goal of this research is to find new models of interaction between automatic multimedia content analysis and social tagging. This project takes as successful instances new services and products such as Flickr, Facebook, YouTube, MySpace, and many others. Our research will address the challenge of efficient management and organization of image collections by enriching images with a semantic context. More details about this research activity can be found here.
Results and resources
Multimodal quality metrics for multimedia content abstraction
- Subjective evaluation of next-generation video compression algorithms: a case study (presented at SPIE’10) [paper]
- Subjective evaluation of scalable video coding for content distribution (presented at ACM MM’10) [paper]
- Gesture and touch controlled video player interface for mobile devices (presented at ACM MM’10) [paper]
- Audio-visual asynchrony detection in multimedia content (presented at NEM’10) [paper] [presentation]
Tagged media-aware multimodal content annotation
- Tag propagation system [demo] [YouTube video 1] [YouTube video 2] [YouTube video 3] [YouTube video 4] [YouTube video 5] [YouTube video 6]
- Semi-automatic image annotation via tag propagation (presented at ACM MIR’10) [paper] [poster] [slides]
- Geotag propagation (presented at SPIE’10) [paper] [slides]
- Geotag propagation based on user trust modeling (published in MTAP) [paper] [slides]
- In tags we trust: Trust modeling in social tagging of multimedia content (published in IEEE SPM) [paper]
- Spam fighting in social tagging systems (presented at SOCINFO’12) [paper]
- Geotag propagation with user trust modeling (published in Springer SMR book) [paper]
- Comparative study of trust modeling for automatic landmark tagging (published in IEEE TIFS) [paper]
- Social game “Epitome” [demo] [presentation] [YouTube video 1] [YouTube video 2] [YouTube video 3]
- Photo album summarization through social game “Epitome” (presented at ACM MM’10) [paper] [poster] [slides]
- Comparison between social game “Epitome” and automatic visual analysis (presented at ICME’11) [paper]
- Epitomize your photos (published in IJCGT) [paper]