Available Projects – Fall 2019 ‒ IVRL ‐ EPFL

Description:
Image restoration with machine learning has attracted a lot of research interest with the recent advancements in deep convolutional networks. A multitude of different methods have been proposed for super resolution, deblurring, denoising etc. Each have their limitations and drawbacks that make them bad candidates for generalization and real-world use. In this project, you will be inspecting a set of state-of-the-art image restoration models to analyze them with a thorough empirical study. The focus will be on joint super resolution and denoising, and you can (optionally) implement an extension idea that we can propose to you.

Example methods to run:
https://github.com/cszn/SRMD
https://github.com/cszn/IRCNN
https://github.com/cszn/DPSR

Deliverables:
Reproducible codes for all experiments. Experimental results with proper visualization to draw insights from your research findings.

Prerequisites:
Basic image processing knowledge, some familiarity with deep learning.

Level:
MS semester project.

Type of work:
60% implementation, 40% research.

Supervisors:
Majed El Helou, Ruofan Zhou.

Description:
Context-aware image retargeting aims to arbitrarily adjust an image aspect ratio while preserving visually salient features. To this end, the salient regions in an image are first estimated, and the image is then retargeted with the saliency map according a desirable aspect ratio. Traditional methods, including deep convolutional neural networks (CNNs) based methods, for this task have been formulated based on the saliency estimation on an image only. Meanwhile, an image caption that describes an image in natural languages would be helpful for estimating the salient regions. In this project, we will propose a novel framework for caption-aware image retargeting that estimates an image saliency through its corresponding caption and uses it for image retargeting. In experiments, we will investigate how much the image retargeting performance is boosted by leveraging the image caption.

References:
[1] Goferman et al., “Context-Aware Saliency Detection,” TPAMI, 2012
[2] Cho et al., “Weakly- and Self-Supervised Learning for Content-Aware Deep Image Retargeting,” ICCV, 2017
[3] Wang et al., “Learning to Detect Salient Objects with Image-level Supervision,” CVPR, 2017
[4] Zeng et al., “Multi-source Weak Supervision for Saliency Detection,” CVPR, 2019

Deliverables:
Report, running prototype, and research paper if possible.

Prerequisites:
Experience in computer vision, machine learning, and especially deep learning.

Level:
MS semester project (potentially BS).

Type of Work:
50% research, 50% development and testing.

Supervisor:
Seungryong Kim

Description:
In this project, you will research the existing literature on weakly-supervised or unsupervised depth estimation and build a model for estimating the depth maps in Comic images. Traditionally, there have been a myriad of depth estimation techniques that have been applied to real world images and are found to work considerably well. However, it becomes challenging to achieve the same results when applied to other image domains such as comics. A possible solution to this problem is domain adaptation where you may use a pretrained model on a natural image dataset and transfer it to a comic dataset. A better solution is to develop a weakly supervised technique for depth estimation in the comics domain. A good starting point is [3].
In this project, you will propose a framework for translating the natural images to comic domain [1-2] and propose a depth estimation network on the comics domain in a weakly-supervised manner. You may contact any of the supervisors at any time should you want to discuss the idea further.

Tasks:
– Understand the literature and our framework.
– Implement an existing state-of-the-art (SOTA) depth estimation model trained on natural image.
– Develop a method to translate the natural images to comics images using the depth maps of natural images
– Compare the performances of existing SOTA on natural images, generated images and comics images

Deliverables:
At the end of the semester, the student should provide a framework for the depth estimation in comic domain along with a project report based on this work.

Prerequisites:
Experience in deep learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow. Experience in statistical analysis.

Type of work:
50% research, 50% development and testing

References:
[1] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”, in IEEE International Conference on Computer Vision (ICCV), 2017.
[2] Youssef A. Mejjati, Christian Richardt, James Tompkin, Darren Cosker, and Kwang In Kim, “Unsupervised Attention-guided Image-to-Image Translation”, in Advances in Neural Information Processing Systems (NIPS), 2018.
[3] Andrea Pilzer, Dan Xu, Mihai Marian Puscas, Elisa Ricci and Nicu Sebe, “Unsupervised Adversarial Depth Estimation using Cycled Generative Networks”, in Proceedings of the 6th International Conference on 3D Vision (3DV 2018). IEEE, 2018.
[4] Ziyu Zhang, Alexander G. Schwing, Sanja Fidler, and Raquel Urtasun. “Monocular Object Instance Segmentation and Depth Ordering with CNNs”, in IEEE International Conference on Computer Vision (ICCV), 2015.

Level:
Master

Supervisor:
Deblina Bhattacharjee ([email protected]), Seungryong Kim ([email protected])

Description:
Visual saliency refers a part in a scene that stands out relative to its neighbors and thus captures our attention.
Conventionally, many saliency detection techniques, including deep convolutional neural networks (CNNs) based approaches, have been developed on natural images. However, these existing techniques perform considerably worse on other imaging domains such as cartoons, artwork, and comics due to the lack of annotated saliency maps on these images.
In this project, we will propose a framework to build pseudo ground-truth saliency datasets in a way that we translate the natural images with corresponding saliency maps to the comics domain with saliency-guided image-to-image translation method, and use them to learn the saliency detection networks on the comics domain in a weakly-supervised manner.

Tasks:
– Understand the literature and state-of-art
– Implement existing saliency detection algorithms
– Implement existing image-to-image translation algorithms
– Develop a method to translate the natural images to comics images with a saliency guidance
– Compare the performances of existing saliency algorithms on natural images, generated images and comics images

References:
[1] Khetarpal, Khimya & Jain, Eakta. A preliminary benchmark of four saliency algorithms on comic art. (2016).
[2] Mejjati, Y. A., Richardt, C., Tompkin, J., Cosker, D., & Kim, K. I. Unsupervised Attention-guided Image-to-Image Translation. In Advances in Neural Information Processing Systems (2018).
[3] Hoffman, J., Tzeng, E., Park, T., Zhu, J. Y., Isola, P., Saenko, K. & Darrell, T. Cycada: Cycle-consistent adversarial domain adaptation. (2017).

Deliverables:
At the end of the semester, the student should provide a framework that provides the translated images and a report of the work.

Prerequisites:
Experience in machine learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow

Type of work:
60% research, 40% development and testing

Level:
MS semester project

Supervisor:
Bahar Aydemir ([email protected]), Seungryong Kim ([email protected])

Synopsis:
Traditional methods for instance-level image segmentation have provided limited ability to deal with other imaging domains such as comics, due to the lack of annotated data on these domains. In this project, we will implement the state-of-the-art methods for this task and apply them on comics datasets. In addition, we will propose a weakly- or un-supervised instance-level image segmentation method that leverages a domain adaptation technique.

References:
[1] P. O. Pinheiro, R. Collobert, and P. Doll´ar, “Learning to segment object candidates,” NIPS, 2015.
[2] B. Zhou, A. Khosla, L. A., A. Oliva, and A. Torralba, “Learning Deep Features for Discriminative Localization.” CVPR, 2016.
[3] A. Rozantsev, M. Salzmann, and P. Fua, “Residual parameter transfer for deep domain adaptation,” CoRR, 2017.

Deliverables:
Report and reproducable implementations

Prerequisites:
Experience with deep learning with Pytorch or another framework, computer vision

Level:
MS semester project

Type of work:
60% research, 40% implementation

Supervisors:
Ihsan Utlu, Seungryong Kim

References:
[1] M. Assens Reina, X. Giro-i-Nieto, K. McGuinness, and N. E. O’Connor, “SaltiNet: Scan-path prediction on 360 degree images using saliency volumes,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2331–2338.
[2] R. Monroy, S. Lutz, T. Chalasani, and A. Smolic, “SalNet360: Saliency Maps for omni-directional images with CNN,” Signal Process. Image Commun., 2018.
[3] M. Startsev and M. Dorr, “360-aware saliency estimation with conventional image saliency predictors,” Signal Process. Image Commun., vol. 69, pp. 43–52, 2018.
[4] Y. Zhu, G. Zhai, and X. Min, “The prediction of head and eye movement for 360 degree images,” Salient360 Vis. Atten. Model. 360° Images, vol. 69, pp. 15–25, Nov. 2018.
[5] J. Ling, K. Zhang, Y. Zhang, D. Yang, and Z. Chen, “A saliency prediction model on 360 degree images using color dictionary based sparse representation,” Salient360 Vis. Atten. Model. 360° Images, vol. 69, pp. 60–68, Nov. 2018.

Deliverables:
Reproducible codes for saliency prediction in VR architectural scenes, Report on the experimental results with proper visualization and insights from your research findings.

Prerequisites:
Familiarity with deep learning, experience in Python, experience in statistical analysis.

Level:
MS semester project (potentially BS).

Type of Work:
50% research, 50% development and testing.

Supervisor:
Seungryong Kim ([email protected]), Bahar Aydemir ([email protected]), and Caroline Karmann ([email protected]) from LIPID (ENAC)

Description:
Extreme video completion, where only a small percentage of pixels (~1%) are retained, allows for a very cheap compression in terms of pre-processing. In this project, you will implement a video completion algorithm whose description is given to you. You will then perform a set of evaluation experiments to analyze the trade-off between perceived video quality and reconstruction error. For this evaluation, you will also compare your results with standard video codecs (e.g. MPEG).

Note:
A research paper publication, which we would work on with you, is a very likely outcome of this project.

Deliverables:
Clean implementation codes for reproducible experiments.

Prerequisites:
Basic image processing knowledge, algorithms/optimization background is a plus.

Level:
MS semester project (potentially BS).

Type of work:
80% implementation, 20% research.

Supervisors:
Majed El Helou, Ruofan Zhou.

Description:
Startup company Innoview Sàrl has developed software to recover by smartphone a watermark hidden into a grayscale image that uses line halftones to display simple graphical elements such as a logo. Now the software has been extended to hide the watermark into dispersed line segments. Adapt this software to work within an Android smartphone. Tune and optimize the available parameters.

Deliverables:
Report and running prototype (Matlab and/or Android).

Prerequisites:
– knowledge of image processing / computer vision
– basic coding skills in Matlab and/or Java Android

Level:
BS or MS semester project or possibly master project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09

Description:
Startup company Innoview Sàrl has developed software to recover by smartphone a watermark by superposing a software revealer on top of a base image that is obtained by camera acquisition. The project aims at extending this project in order to recover watermarks that are printed on a curved surface such as on the label of a bottle of wine.

Deliverables: Report and running prototype (Matlab and/or Android).

Prerequisites:
– knowledge of image processing
– basic coding skills in Matlab and Java Android

Level:
BS or MS semester project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09

Epson Printer Driver for synthesizing hidden codes

Description: Startup company Innoview Sàrl has developed software to recover by smartphone a hidden watermark printed on an desktop Epson printer. Special Epson P50 printer driver software enables printing the hidden watermark. That Epson P50 printer is now replaced by new types of Epson printers that require a modified driver software. The project consists in understanding the previous driver software and at modifying it so as to be able to drive the new Epson printer. Possibly, reverse engineering will be necessary to obtain some of the new non documented driver codes.

Deliverables:
Report and running prototype (C, C++ or Matlab).

Prerequisites:
– knowledge of image processing
– basic coding skills in C, C++ or Matlab

Level:
BS or MS semester project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09

Description:
In this project, you will work on a tool for the automatic generation of movie trailers. Based on our database of ~15000 trailers and using our Deep Learning model for automatic video genre detection, you will develop a solution to identify the scenes of a movie which have the most potential for an effective movie trailer.

Tasks:
– Understand the literature and our framework.
– Perform in depth statistical analysis of our trailer database.
– Implement a solution for trailer scene detection based on our Deep Learning model.
– Develop a testing solution and test the model on a case study.

Deliverables:
At the end of the semester, the student should provide a framework for automatic trailer scene detection.

Prerequisites:
Experience in deep learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow. Experience in statistical analysis.

Type of work:
50% research, 50% development and testing

References:
– M. Hesham, B. Hani, N. Fouad and E. Amer, “Smart trailer: Automatic generation of movie trailer using only subtitles,” 2018 First International Workshop on Deep and Representation Learning (IWDRL), Cairo, 2018, pp. 26-30.
– Y. Hou et al., “Predicting Movie Trailer Viewer’s “Like/Dislike” via Learned Shot Editing Patterns,” in IEEE Transactions on Affective Computing, vol. 7, no. 1, pp. 29-44, 1 Jan.-March 2016.
– Salma Karray & Lidia Debernitz (2017) The effectiveness of movie trailer advertising, International Journal of Advertising, 36:2, 368-392

Level:
Master

Supervisor:
Sami Arpa ([email protected]), Gabriel Autes ([email protected])

Description:
In this project, you will work on the development of machine learning models for the prediction of movie performance. Based on our database of ~20000 movies trailers, ~1000 movie scripts and using our Deep Learning model for automatic video and text genre detection, you will develop a solution to predict the performance of movies by combining video/script analysis and features extracted from network representations of the Internet Movie Database.

Tasks:
– Understand the literature and our framework.
– Perform in depth statistical analysis of our movie database.
– Implement and test different machine learning approach for movie performance prediction.
– Test the model on case studies.

Deliverables:
At the end of the semester, the student should have implemented and tested machine learning models for movie performance prediction.

Prerequisites:
Experience in deep learning and machine learning, experience in Python. Experience in statistical analysis.

Type of work:
50% research, 50% development and testing

References:
– M. Ghiassi, David Lio, Brian Moon, Pre-production forecasting of movie revenues with a dynamic artificial neural network, Expert Systems with Applications, Volume 42, Issue 6, 2015, Pages 3176-3193
– Simonoff, J. S. and Sparrow, I. R. Predicting movie grosses: Winners and losers, blockbusters and sleepers. In Chance, 2000.

Level:
Master

Supervisor:
Sami Arpa ([email protected]), Gabriel Autes ([email protected])

Synopsis:
3D reconstruction of objects has traditionally been done with a set of two or more cameras. We would like to instead replace the additional cameras with mirrors, allowing us to capture a scene from several viewpoints within a single image by using a single camera.
In this project, you will implement a system capable of 3D object reconstruction. You will start by designing and building a box composed of four mirrors that will later be placed in front of the camera. The reflections from this box of mirrors will provide us with four additional views in addition to the central view [1]. The central view will be projected in the center of the camera image, and the additional views will be located on the sides of the camera image. Like in traditional stereo systems, you will calibrate the setup, rectify the side, top and bottom views and match image features along epipolar lines across the views [2]. Finally, you will perform triangulation to reconstruct the 3D scene.

References:
[1] Joshua Gluckman and Shree K. Nayar. “Rectified catadioptric stereo sensors.” IEEE transactions on pattern analysis and machine intelligence (2002).
[2] Joshua Gluckman and Shree K. Nayar. “Catadioptric stereo using planar mirrors.” International Journal of Computer Vision (2001).

Deliverables:
Report and running prototype.

Prerequisites:
– knowledge of computer vision
– coding skills in Matlab and Python or C/C++

Level:
MS semester project

Type of work:
50% implementation and 50% research

Supervisor:
Marjan Shahpaski ([email protected])

Synopsis:
Radial imaging systems capture a scene from a large number of viewpoints within a single image, using a curved mirror and a camera. These systems can recover scene properties such as scene geometry, surface texture, and BSDF [1].
In this project, you will implement a system that simulates an image captured by such a setup. You will build a simple 3D scene that contains the curved mirror and the camera in Blender. Then you will modify the Mitsuba physically based renderer [2][3] to be able to work with empirical (measured) BSDFs.
To validate if the system works correctly, you will assign a measured BSDF to an object in the 3D scene. Once the object is illuminated from a known incident direction, the outgoing light’s intensity will vary according to the object’s BSDF. A discrete set of outgoing angles will then be captured by the camera, after they undergo multiple reflections from the curved mirror’s walls. The set of outgoing angles, together with the light intensity in each angle, will let us reconstruct the object’s BSDF.

References:
[1] Sujit Kuthirummal and Shree K. Nayar. “Multiview radial catadioptric imaging for scene capture.” ACM Transactions on Graphics (2006).
[2] http://www.mitsuba-renderer.org
[3] https://github.com/mitsuba-renderer/mitsuba

Deliverables:
Report and running prototype.

Prerequisites:
– knowledge of computer vision
– coding skills in Matlab and Python or C/C++

Level:
BS/MS semester project

Type of work:
50% implementation and 50% research

Supervisor:
Marjan Shahpaski

Description:
In video super-resolution, the spatio-temporal coherence between, and among the frames can be exploited appropriately for accurate prediction of the high-resolution frames. Although many state-of-the-art super-resolution methods utilize temporal information in their input to boost their performance, they still favor simpler norms on single frame for the loss function, which might leads to some undesirable flickering and artifacts on the results. In this project, you will cooperate the state-of-the-art video super-resolution networks with temporally coherent loss, to see if it could help to remove temporal artifacts and improve the perceptual quality of the video.

Tasks:
– Literature
– Review on convolutional neural network based methods for video super-resolution
– Collect datasets
– Literature review on video quality assessment
– Implement and evaluate temporally coherent loss for video super-resolution
– (optional) implement and evaluate temporally coherent GAN for video super-resolution

Deliverables:
Report and implementation of deep convolutional networks

Prerequisites:
experience/interests in convolutional neural networks, experience/interests in image and video processing

References:
[1] Younghyun Jo, et al, “Deep Video Super-Resolution Using Dynamic Upsampling Filters Without Explicit Motion Compensation”, CVPR2018
[2] Ce Liu et al, “A Bayesian Approach to Adaptive Video Super-Resolution”, CVPR 2011
[3] Wang, Zhou et al, “Video quality assessment based on structural distortion measurement”. Signal processing: Image communication, 2014

Level:
MS semester project

Type of work:
50% implementation, 50% research

Supervisor:
Ruofan Zhou ([email protected])

Description:
When compressing or downsampling the text images, it is important to keep the readability of the text. On the other hand, when applying super-resolution (which is an inverse operation of downsampling) on text images, it is also important to increase the readability rather than just increase the sharpness of the image. The goal of this project is to experiment on different network architectures and loss functions for text image downsampling and super-resolution, and evaluate the results according to the readability.

Tasks:
– Literature review on convolutional neural network based methods for image compression and super-resolution
– Collect datasets and data augmentation
– Implementation some network architectures and loss functions
– Literature review on character/text recognition
– Evaluation on the implemented networks

Deliverables:
Report and implementation of deep convolutional networks

Prerequisites:
experience/interests in convolutional neural networks, experience/interests in image processing

References:
[1] Zhang, Haochen, Dong Liu, and Zhiwei Xiong, “CNN-based text image super-resolution tailored for OCR”, IEEE Visual Communications and Image Processing, 2017
[2] Peyrard, Clement, et al. “ICDAR2015 competition on text image super-resolution”, International Conference on Document Analysis and Recognition, 2015
[3] Howard, Paul G. “Lossless and lossy compression of text images by sof pattern matching”, Proceddings of Data Compression Conference-DCC, 1996s

Level:
MS semester project (potentially BS)

Type of work:
60% implementation, 40% research

Supervisor:
Ruofan Zhou ([email protected])

Description:
The goal of this project is to evaluate longitudinal chromatic aberration from a single photo in the presence of lateral chromatic aberration. You will be acquiring a set of photos of printed edges placed at different depths in the scene and the objective is to remove the lateral chromatic aberration across the image to be able to use its full width for assessing longitudinal chromatic aberration, which you will work on in the second part of the project. Your final results should then be compared to prior work results to evaluate the matching.

Tasks:
– Review the literature; PSF estimation protocols, lens assessment and chromatic aberration.
– Capture a dataset of edge images.
– Remove lateral chromatic aberration across the image.
– Evaluate longitudinal chromatic aberration from a single image where the lateral chromatic aberration was corrected.
– Evaluate the results on different lenses/cameras and compare to prior work.

References:
[1] http://www.imatest.com/docs/sfr_chromatic/
[2] https://www.dxomark.com/dxomark-lens-camera-sensor-testing-protocol/

Deliverables:
Report, dataset, implementation codes for lateral chromatic aberration removal and for single-image longitudinal chromatic aberration assessment.

Prerequisites:
Comfortable reading codes in MATLAB (and writing), strong mathematical signal processing background, experience with hardware and (professional) image acquisition techniques.

Type of work:
80% research, 20% development and testing

Level:
BS

Supervisor:
Majed El Helou