Available Projects – Spring 2020 ‒ IVRL ‐ EPFL

Description:

This project will combine model robustness with parameter binarization. We will investigate the robustness of a specific kind of network where all parameters are binary. How to train binary networks in a non-adversarial environment is well studied in recent years [1, 2]. For non-binary networks, Project Gradient Descent (PGD) [3] is a straightforward but empirically effective method to obtain robust models. In this project, we will study the robustness property of binary networks. We will further design algorithms to train robust binary networks. (Full description is available in this document.)

References:

[1] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binarycon- nect: Training deep neural networks with binary weights during propagations. NIPS 2015.

[2] Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks: Training deep neural net- works with weights and activations constrained to+ 1 or-1. 2016.

[3] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. ICLR 2018.

Deliverables:

Report. Reproducible code. Possible paper submission.

Prerequisites:

Mathematical foundations (calculus, linear algebra). Optimization (gradient descent, primal-dual method). Deep learning.

Level:

MS semester project. (Spring 2020)

Type of work:

20% literature review, 50% research, 30% development and testing.

Supervisor: Chen Liu

Description:

Modern deep learning systems are known to be vulnerable to adversarial attacks. Small not well-designed adversarial perturbation can make the state-of-the-art model predict wrong label with very high confidence. Fast Gradient Sign Method (FGSM) [1] and Projected Gradient Descent (PGD) [2] are two effective method to obtain robust models against adversarial attacks. In this project, we study the validity and strength of FGSM-based and PGD-based adversarial training. Furthermore, we will take a look at the loss landscape of training objective in normal training, FGSM-based training and PGD-based training. (Full description is available in this document.)

References:

[1] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. ICLR 2014.

[2] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. ICLR 2018.

Deliverables:

Report. Reproducible code. Visualization of loss landscape studied.

Prerequisites:

Mathematical foundations (calculus, linear algebra). Gradient descent. Deep learning.

Level:

BS semester project. (Spring 2020)

Type of work:

20% literature review, 50% research, 30% development and testing.

Supervisor: Chen Liu

Description:
Context-aware image retargeting aims to arbitrarily adjust an image aspect ratio while preserving visually salient features. To this end, the salient regions in an image are first estimated, and the image is then retargeted with the saliency map according a desirable aspect ratio. Traditional methods, including deep convolutional neural networks (CNNs) based methods, for this task have been formulated based on the saliency estimation on an image only. Meanwhile, an image caption that describes an image in natural languages would be helpful for estimating the salient regions. In this project, we will propose a novel framework for caption-aware image retargeting that estimates an image saliency through its corresponding caption and uses it for image retargeting. In experiments, we will investigate how much the image retargeting performance is boosted by leveraging the image caption.

References:
[1] M. Assens Reina, X. Giro-i-Nieto, K. McGuinness, and N. E. O’Connor, “SaltiNet: Scan-path prediction on 360 degree images using saliency volumes,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2331–2338.
[2] R. Monroy, S. Lutz, T. Chalasani, and A. Smolic, “SalNet360: Saliency Maps for omni-directional images with CNN,” Signal Process. Image Commun., 2018.
[3] M. Startsev and M. Dorr, “360-aware saliency estimation with conventional image saliency predictors,” Signal Process. Image Commun., vol. 69, pp. 43–52, 2018.
[4] Y. Zhu, G. Zhai, and X. Min, “The prediction of head and eye movement for 360 degree images,” Salient360 Vis. Atten. Model. 360° Images, vol. 69, pp. 15–25, Nov. 2018.
[5] J. Ling, K. Zhang, Y. Zhang, D. Yang, and Z. Chen, “A saliency prediction model on 360 degree images using color dictionary based sparse representation,” Salient360 Vis. Atten. Model. 360° Images, vol. 69, pp. 60–68, Nov. 2018.

Deliverables: Reproducible codes for saliency prediction in VR architectural scenes, Report on the experimental results with proper visualization and insights from your research findings.

Prerequisites: Familiarity with deep learning, experience in Python, experience in statistical analysis.

Level: MS semester project (potentially BS).

Type of Work: 50% research, 50% development and testing.

Supervisor: Seungryong Kim ([email protected]), Bahar Aydemir ([email protected]), and Caroline Karmann ([email protected]) from LIPID (ENAC)

Synopsis: Traditional methods for instance-level image segmentation have provided
limited ability to deal with other imaging domains such as comics, due
to the lack of annotated data on these domains. In this project, we will
implement the state-of-the-art methods for this task and apply them on
comics datasets. In addition, we will propose a weakly- or un-supervised
instance-level image segmentation method that leverages a domain
adaptation technique.

References:
[1] P. O. Pinheiro, R. Collobert, and P. Doll´ar, “Learning to segment
object candidates,” NIPS, 2015.
[2] B. Zhou, A. Khosla, L. A., A. Oliva, and A. Torralba, “Learning Deep
Features for Discriminative Localization.” CVPR, 2016.
[3] A. Rozantsev, M. Salzmann, and P. Fua, “Residual parameter transfer
for deep domain adaptation,” CoRR, 2017.

Deliverables: Report and reproducable implementations

Prerequisites: Experience with deep learning with Pytorch or another
framework, computer vision

Level: MS semester project

Type of work: 60% research, 40% implementation

Supervisors: Seungryong Kim

Description: Visual saliency refers a part in a scene that captures our attention.
Conventionally, many saliency detection techniques, including deep convolutional neural networks (CNNs) based approaches, have been developed on natural images. However, these existing techniques perform considerably worse on other imaging domains such as cartoons, artwork, and comics due to the lack of annotated saliency maps on these images.
In this project, we will propose a framework to translate the natural images with corresponding saliency maps to the comics domain with saliency-guided image-to-image translation method. We will use them to learn the saliency detection networks on the comics domain in a weakly-supervised manner.

Tasks:
– Understand the literature and state-of-art
– Implement state-of-the-art image-to-image translation algorithms
– Develop a method to translate the comics images with saliency guidance
– Compare the performances of existing saliency algorithms on natural
images, generated images and comics images

References:
[1] Khetarpal, Khimya & Jain, Eakta. A preliminary benchmark of four saliency algorithms on comic art. (2016).
[2] Mejjati, Y. A., Richardt, C., Tompkin, J., Cosker, D., & Kim, K. I. Unsupervised Attention-guided Image-to-Image Translation. In Advances in Neural Information Processing Systems (2018).
[3] Hoffman, J., Tzeng, E., Park, T., Zhu, J. Y., Isola, P., Saenko, K. & Darrell, T. Cycada: Cycle-consistent adversarial domain adaptation. (2017).

Deliverables: At the end of the semester, the student should provide a framework that provides the translated images and a report of the work.

Prerequisites: Experience in machine learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow

Type of work: 60% research, 40% development and testing

Level: MS semester project

Supervisor: Bahar Aydemir ([email protected]), Seungryong Kim ([email protected])

Description: In this project, you will research the existing literature on weakly-supervised or unsupervised depth estimation and build a model for estimating the depth maps in Comic images. Traditionally, there have been a myriad of depth estimation techniques that have been applied to real world images and are found to work considerably well. However, it becomes challenging to achieve the same results when applied to other image domains such as comics. A possible solution to this problem is domain adaptation where you may use a pretrained model on a natural image dataset and transfer it to a comic dataset. A better solution is to develop a weakly supervised technique for depth estimation in the comics domain. A good starting point is [3].

In this project, you will propose a framework for translating the natural images to comic domain [1-2] and propose a depth estimation network on the comics domain in a weakly-supervised manner. You may contact any of the supervisors at any time should you want to discuss the idea further.

Tasks:
– Understand the literature and our framework.
– Implement an existing state-of-the-art (SOTA) depth estimation model trained on natural image.
– Develop a method to translate the natural images to comics images using the depth maps of natural images
– Compare the performances of existing SOTA on natural images, generated images and comics images

Deliverables: At the end of the semester, the student should provide a framework for the depth estimation in comic domain along with a project report based on this work.

Prerequisites: Experience in deep learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow. Experience in statistical analysis.

Type of work: 50% research, 50% development and testing

References:
[1] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”, in IEEE International Conference on Computer Vision (ICCV), 2017.
[2] Youssef A. Mejjati, Christian Richardt, James Tompkin, Darren Cosker, and Kwang In Kim, “Unsupervised Attention-guided Image-to-Image Translation”, in Advances in Neural Information Processing Systems (NIPS), 2018.
[3] Andrea Pilzer, Dan Xu, Mihai Marian Puscas, Elisa Ricci and Nicu Sebe, “Unsupervised Adversarial Depth Estimation using Cycled Generative Networks”, in Proceedings of the 6th International Conference on 3D Vision (3DV 2018). IEEE, 2018.
[4] Ziyu Zhang, Alexander G. Schwing, Sanja Fidler, and Raquel Urtasun. “Monocular Object Instance Segmentation and Depth Ordering with CNNs”, in IEEE International Conference on Computer Vision (ICCV), 2015.

Level: Master

Supervisor: Deblina Bhattacharjee ([email protected]), Seungryong Kim ([email protected])

Description: In this project, you will work on the recommendation system of Sofy.tv. Sofy.tv uses a recommendation system based on the recipes of the movies. These recipes are found through our multi-channel deep learning system. The goal of this project is to improve the recommendations by finding the best fits for the user taste..

Tasks:
– Understand the literature and our framework
– Revise our taste clustering system
– Revise our matchmaking system between the users and films.
– Test the revised model.

Deliverables: At the end of the semester, the student should provide an enhanced framework for the recommendation.

Prerequisites: Experience in deep learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow. Basic experience in web programming.

Type of work: 50% research, 50% development and testing.

Level: Master

Supervisor: Sami Arpa ([email protected])

Description: In this project, you will work on the development of machine learning models for the prediction of movie performance on the streaming platforms. Based on our database of ~20000 movies trailers, ~9000 movie scripts/transcripts and using our Deep Learning model for automatic video and text genre detection, you will develop a solution to predict the performance of movies on the streaming platforms by combining video/script analysis and features extracted from network representations of the Internet Movie Database.

Tasks:
– Understand the literature and our framework.
– Perform in depth statistical analysis of our movie database.
– Implement and test different machine learning approach for movie performance prediction.
– Test the model on case studies.

Deliverables: At the end of the semester, the student should have implemented and tested machine learning models for movie performance prediction.

Prerequisites: Experience in deep learning and machine learning, experience in Python. Experience in statistical analysis.

Type of work: 50% research, 50% development and testing

References:
– M. Ghiassi, David Lio, Brian Moon, Pre-production forecasting of movie revenues with a dynamic artificial neural network, Expert Systems with Applications, Volume 42, Issue 6, 2015, Pages 3176-3193
– Simonoff, J. S. and Sparrow, I. R. Predicting movie grosses: Winners and losers, blockbusters and sleepers. In Chance, 2000.

Level: Master

Supervisor: Sami Arpa ([email protected])

Description: In this project, you will work on a tool for the automatic generation of metadata from commercial videos. Based on a database of ~80000 commercials and using our Deep Learning model for automatic video pattern detection, you will develop a solution to automatically generate relevant keywords for any given keyword.

Tasks:
– Understand the literature and our framework.
– Perform in depth statistical analysis of the database of commercials.
– Implement a solution for metadata on our Deep Learning model.
– Develop a testing solution and test the model on a case study.

Deliverables: At the end of the semester, the student should provide a framework for automatic metadata predictor.

Prerequisites: Experience in deep learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow. Experience in statistical analysis.

Type of work: 50% research, 50% development and testing

References:
– Harper, F. Maxwell, and Joseph A. Konstan. “The movielens datasets: History and context.” Acm transactions on interactive intelligent systems (tiis) 5.4 (2016): 19.
– Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1725-1732

Level: Master

Supervisor: Sami Arpa ([email protected])

Description:

Visual saliency refers a part in a scene that captures our attention. Current approaches for saliency estimation use eye tracking data on natural images for constructing ground truth. However, in our project we will perform eye tracking on comics pages instead of natural images. Later, we will use the collected data to estimate saliency in comics domain. In this project, you will work on an eye tracking experiment with mobile eye tracking glasses.

Tasks:
– Understand the key points of an eye tracking experiment and our setup.

– Conduct an eye tracking experiment according to given instructions.

– Perform a detailed analysis of collected data by producing heatmaps, scanpaths and histograms.

– Evaluate a state-of-the art saliency estimation model on the collected data and compare the results with existing results on natural images

Deliverables: At the end of the semester, the student should provide the collected data and a report of the work.

Type of work: 20% research, 80% development and testing

References:

[1] A. Borji and L. Itti, “Cat2000: A large scale fixation dataset for boosting saliency research,” CVPR 2015 workshop on ”Future of Datasets”, 2015.

[2] Kai Kunze , Yuzuko Utsumi , Yuki Shiga , Koichi Kise , Andreas Bulling, I know what you are reading: recognition of document types using mobile eye tracking, Proceedings of the 2013 International Symposium on Wearable Computers, September 08-12, 2013, Zurich, Switzerland.

[3] K. Khetarpal and E. Jain, “A preliminary benchmark of four saliency algorithms on comic art,” 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Seattle, WA.

Level: BS semester project

Supervisor: Bahar Aydemir ([email protected])

References:
[1] Goferman et al., “Context-Aware Saliency Detection,” TPAMI, 2012
[2] Cho et al., “Weakly- and Self-Supervised Learning for Content-Aware Deep Image Retargeting,” ICCV, 2017
[3] Wang et al., “Learning to Detect Salient Objects with Image-level Supervision,” CVPR, 2017
[4] Zeng et al., “Multi-source Weak Supervision for Saliency Detection,” CVPR, 2019

Deliverables: Report, running prototype, and research paper if possible.

Prerequisites: Experience in computer vision, machine learning, and especially deep learning.

Level: MS semester project (potentially BS).

Type of Work: 50% research, 50% development and testing.

Supervisor: Seungryong Kim

Description:
Image restoration with machine learning has attracted a lot of research interest with the recent advancements in deep convolutional networks. A multitude of different methods have been proposed for super resolution, deblurring, denoising etc. Each have their limitations and drawbacks that make them bad candidates for generalization and real-world use. In this project, you will be inspecting a set of state-of-the-art image restoration models to analyze them with a thorough empirical study. The focus will be on joint super resolution and denoising, and you can (optionally) implement an extension idea that we can propose to you.

References:

Zhou, Ruofan, et al. “A comparative study on wavelets and residuals in deep super resolution”. Electronic Imaging. 2019.

Example methods to run:
https://github.com/cszn/SRMD
https://github.com/cszn/IRCNN
https://github.com/cszn/DPSR

Deliverables:
Reproducible codes for all experiments. Experimental results with proper visualization to draw insights from your research findings.

Prerequisites:
Basic image processing knowledge, some familiarity with deep learning.

Level:
MS semester project.

Type of work:
60% implementation, 40% research.

Supervisors:
Majed El Helou, Ruofan Zhou.

Description:
In the context of computer vision and image processing, video registration refers to aligning two videos of the same scene. This project targets registration of two different imaging modalities–namely, RGB and Infrared (IR). While RGB is related to human perception, IR is related to heat. We present a challenging dataset of RGB and IR video pairs. Specific to this task, classical methods cannot achieve satisfactory performance since the camera pairs are slightly moving during video capture. Also, labelling the dataset is costly and impractical. Therefore, we target an unsupervised deep-learning-based solution to this problem. Interested students are encouraged to have a look at related works that include [1] and [2].

References:

[1] An Unsupervised Learning Model for Deformable Medical Image Registration, CVPR, 2018

[2] Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV, 2017

Deliverables:
Report and running prototype.

Prerequisites:
Basic image processing knowledge and deep learning.

Level:
BS semester project. (Spring 2020)

Type of work:
50% research, 50% development and testing.

Supervisor:
Hakki Can Karaimer

Description:
This project targets joint registration and fusion of two different imaging modalities–namely, RGB and Infrared (IR). While RGB is related to human perception, IR is related to the heat. We present a challenging dataset of RGB and IR video pairs. In this context, the “video registration” aims to align two videos of the same scene and the “video fusion” is the process of bringing all the essential information from two videos to a single video. Therefore, the main goal of this task is to fuse the IR and RGB videos of the same scene. Specific to this task, classical methods cannot achieve satisfactory performance since the camera pairs are slightly moving during video capture. Also, labelling the dataset is costly and impractical. As a result, we target an unsupervised deep-learning-based solution to this problem. Interested students are encouraged to have a look at related works that include [1], [2], and [3].

References:

[1] An Unsupervised Learning Model for Deformable Medical Image Registration, CVPR, 2018

[2] Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV, 2017

[3] Fast and Efficient Zero-Learning Image Fusion, arXiv, 2019

Deliverables:
Report and running prototype.

Prerequisites:
Basic image processing knowledge and deep learning.

Level:
MS semester project. (Spring 2020)

Type of work:
50% research, 50% development and testing.

Supervisor:
Hakki Can Karaimer

Description:
Microscopy imaging is crucial for medical research. Multiple imaging techniques are available with different advantages, for capturing different material properties at a microscopic scale. The goal of this project is to collect a microscopy dataset. You will then perform a set of evaluation experiments to analyze the performance of different algorithms on your dataset.

Deliverables:
Microscopy image dataset, and benchmarking on a set of available algorithms.

Prerequisites:
Basic image processing knowledge. Deep learning experience could be a plus.

Level:
MS semester project (potentially BS).

Type of work:
60% implementation, 40% research.

Supervisors:
Majed El Helou, Ruofan Zhou.

Description: Startup company Innoview Sàrl has developed software to recover by smartphone a watermark hidden into a grayscale image that uses line halftones to display simple graphical elements such as a logo. Now the software has been extended to hide the watermark into dispersed line segments. Adapt this software to work within an Android smartphone. Tune and optimize the available parameters.

Deliverables: Report and running prototype (Matlab and/or Android).

Prerequisites:
– knowledge of image processing / computer vision
– basic coding skills in Matlab and/or Java Android

Level: BS or MS semester project or possibly master project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09

Description: Startup company Innoview Sàrl has developed software to recover by smartphone a hidden watermark printed on an desktop Epson printer. Special Epson P50 printer driver software enables printing the hidden watermark. That Epson P50 printer is now replaced by new types of Epson printers that require a modified driver software. The project consists in understanding the previous driver software and at modifying it so as to be able to drive the new Epson printer. Possibly, reverse engineering will be necessary to obtain some of the new non documented driver codes.

Deliverables: Report and running prototype (C, C++ or Matlab).

Prerequisites:
– knowledge of image processing
– basic coding skills in C, C++ or Matlab

Level: BS or MS semester project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09

Startup company Innoview Sàrl has developed software to recover by smartphone a watermark by superposing a software revealer on top of a base image that is obtained by camera acquisition. The project aims at extending this project in order to recover watermarks that are printed on a curved surface such as on the label of a bottle of wine.

Deliverables: Report and running prototype (Matlab and/or Android).

Prerequisites:
– knowledge of image processing
– basic coding skills in Matlab and Java Android

Level: BS or MS semester project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09

Description: The goal of this project is to evaluate longitudinal (axial) chromatic aberration from a single photo in the presence of lateral chromatic aberration. You will be acquiring a set of photos of printed edges placed at different depths in the scene and the objective is to remove the lateral chromatic aberration across the image to be able to use its full width for assessing longitudinal chromatic aberration, which you will work on in the second part of the project. Your final results should then be compared to prior work results to evaluate the matching. The utility of the results is in fusing image for increased depth of field.

Tasks:
Review the literature; PSF estimation protocols, lens assessment and chromatic aberration.
Capture a dataset of edge images.
Remove lateral chromatic aberration across the image.
Evaluate longitudinal chromatic aberration from a single image where the lateral chromatic aberration was corrected.
Evaluate the results on different lenses/cameras and compare to prior work.

References: El Helou, Majed, Frederike Dümbgen, and Sabine Süsstrunk. “AAM: AN Assessment Metric of Axial Chromatic Aberration.” IEEE International Conference on Image Processing (ICIP), 2018.

El Helou, Majed, Zahra Sadeghipoor, and Sabine Süsstrunk. “Correlation-based deblurring leveraging multispectral chromatic aberration in color and near-infrared joint acquisition.” IEEE International Conference on Image Processing (ICIP), 2017.

http://www.imatest.com/docs/sfr_chromatic/

https://www.dxomark.com/dxomark-lens-camera-sensor-testing-protocol/

Deliverables: Report, dataset, implementation codes for lateral chromatic aberration removal and for single-image longitudinal chromatic aberration assessment.

Prerequisites: Comfortable reading codes in MATLAB (and writing), strong mathematical signal processing background, experience with hardware and (professional) image acquisition techniques.

Type of work: 80% research, 20% development and testing

Level: BS

Supervisor: Majed El Helou