Available Projects – Fall 2024 ‒ IVRL ‐ EPFL

Semester projects are open to EPFL students

Description:

Blind face restoration endeavors to recover high-quality facial images from low-quality counterparts with various unknown degradations, including noise, compression, and blur. Recent advancements, employing deep convolutional networks [1,2,3,4], have demonstrated remarkable progress. Nevertheless, these methods struggle when faced with extreme degradation scenarios, often arising from severe levels of distortion or large facial poses, posing a persistent challenge. Although recent approaches propose leveraging different priors, for example, 3D priors [5], geometric priors [6] or generative priors [7] to enhance restoration quality, they still exhibit artifacts in extreme cases.

Recently, diffusion models have exhibited robust capabilities in generating realistic images, and they have been used in restoration tasks such as image super resolution and image deblurring [8,9]. In this project, we aim to explore the potential of diffusion models in addressing the demanding task of extreme blind face restoration. We will try to harness the extensive prior knowledge encoded in existing pre-trained diffusion models, seeing if we can extract textural or structural information of natural facial images that might encoded within these models, and use this information to aid our facial image restoration task.

Key Questions:

Instead of training a diffusion model from scratch with a limited number of images available in benchmarks, is it possible for us to utilize the prior information encoded in diffusion models that were pre-trained on a large amount of data to aid the restoration task? How can we extract relevant information?

After extracting the relevant information, what is the best way to fuse it with the low-quality images to obtain good results?

The pre-trained diffusion models might be biased towards facial images with normal poses. How can we deal with this?

References:

[1]. Wang, Xintao, et al. “Towards real-world blind face restoration with generative facial prior.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

[2]. Wang, Zhouxia, et al. “Restoreformer: High-quality blind face restoration from undegraded key-value pairs.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.

[3]. Zhou, Shangchen, et al. “Towards robust blind face restoration with codebook lookup transformer.” Advances in Neural Information Processing Systems 35 (2022): 30599-30611.

[4]. Gu, Yuchao, et al. “Vqfr: Blind face restoration with vector-quantized dictionary and parallel decoder.” European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.

[5]. Chen, Zhengrui, et al. “Blind Face Restoration under Extreme Conditions: Leveraging 3D-2D Prior Fusion for Superior Structural and Texture Recovery.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. No. 2. 2024.

[6]. Zhu, Feida, et al. “Blind face restoration via integrating face shape and generative priors.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.

[7]. Yang, Tao, et al. “Gan prior embedded network for blind face restoration in the wild.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

[8]. Saharia, Chitwan, et al. “Image super-resolution via iterative refinement.” IEEE transactions on pattern analysis and machine intelligence 45.4 (2022): 4713-4726.

[9]. Zhu, Yuanzhi, et al. “Denoising diffusion models for plug-and-play image restoration.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

Prerequisites:

Python and PyTorch.

Type of Work:

MS semester project.

80% research, 20% development

Supervisor:

Liying Lu ([email protected])

Startup company Innoview has developed arrangements of lenslets that can be used to create document security features. The goal is to verify to which extent a software simulator like Blender is able to faithfully simulate the behavior of light interacting with such lenslets.

Deliverables: Report and running prototype (Matlab). Blender lenslet simulations.

Prerequisites:

– knowledge of image processing / computer vision

– basic coding skills in Matlab

Level: BS or MS semester project

Supervisors:

Prof. Roger D. Hersch, BC110, [email protected], cell: 077 406 27

Dr Romain Rossier, Innoview Sàrl, [email protected], , tel

078 664 36 44

Startup company Innoview has developed new moiré features that can prevent counterfeits. Some types of moiré features rely on grayscale images. The present project aims at creating a grayscale image editor. Designers should be able to shape their grayscale image by various means (interpolation between spatially defined grayscale values, geometric transformations, image warping, etc…).

Deliverables: Report and running prototype (Matlab). Blender lenslet simulations.

Prerequisites:

– knowledge of image processing / computer vision

– coding skills in Matlab

Level: BS or MS semester project

Supervisors:

Prof. Roger D. Hersch, BC110, [email protected], cell: 077 406 27

Dr Romain Rossier, Innoview Sàrl, [email protected], , tel

078 664 36 44

Startup company Innoview Sàrl has developed software to recover a message hidden into patterns. Appropriate settings of parameters enable the detection of counterfeits. The goal of the project is to define optimal parameters for different sets of printing conditions (resolution, type of paper, type printing device, complexity of hidden watermark, etc..). The project involves tests on a large data set and appropriate statistics.

Deliverables: Report and running prototype (Android, Matlab).

Prerequisites:

– knowledge of image processing / computer vision
– basic coding skills in Matlab and/or Java Android

Level: BS or MS semester project

Supervisors:

Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, BC110, [email protected], cell: 077 406 27 09

This project aims to explore whether there is any semantic information encoded by off-the-shelf diffusion model that helps us and other deep learning models understand what is the content of an image or the relationship between images.

Diffusion models [1] have been the new paradigm for generative modeling in computer vision. Despite its success, it remains to be a black box during generation. At each step, it provides a direction, namely the score, towards the data distribution. As shown in recent work [2], the score can be decomposed into different meaningful components. The first research question is: does the score encode any semantic information of the generated image?

Moreover, there is evidence that the representation learned by diffusion models is helpful to discriminative models. For example, it can boost the classification performance by knowledge distillation [3]. Furthermore, diffusion model itself can be used as a robust classifier [4]. It can be seen that discriminative information can be extracted from the diffusion model. Then the second question is: What is the information about? Is it about the object shape? Location? Texture? Or other kinds of information.

This is an exploratory project. We will try to interpret the black box in diffusion model and dig semantic information that it encodes. Together, we will also brainstorm the application of diffusion model other than image generation. This project can be a good chance for you to develop interest and skills in scientific research.

References:

[1] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in neural information processing systems, 2020, 33: 6840-6851.

[2] Alldieck T, Kolotouros N, Sminchisescu C. Score Distillation Sampling with Learned Manifold Corrective[J]. arXiv preprint arXiv:2401.05293, 2024.

[3] Yang X, Wang X. Diffusion model as representation learner[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 18938-18949.

[4] Chen H, Dong Y, Shao S, et al. Your diffusion model is secretly a certifiably robust classifier[J]. arXiv preprint arXiv:2402.02316, 2024.

Deliverables: Deliverables should include code, well cleaned up and easily reproducible, as well as a written report, explaining the models, the steps taken for the project and the results.

Prerequisites: Python and PyTorch. Basic understanding of diffusion models.

Level: MS research project

Number of students: 1

Contact: Yitao Xu, [email protected]