Available Projects – Fall 2024 ‒ IVRL ‐ EPFL

Semester projects are open to EPFL students

Description:

While film photography was almost completely replaced by digital photography in the beginning of the century, it is nonetheless slowly growing in popularity, because of its distinctive “look”. However, most users are not familiar with how a film camera works, and film stock prices are skyrocketing. For this reason, film emulators are very popular, since they can take as input a digital image and produce an approximate simulation of what a film photograph of the same scene would look like. However, different film stocks have different physical (or chemical) responses to light, because of the difference in sensitivity(ISO), the distribution of the silver halide grains, the color filters used to produce color images, the presence or not of halation filters etc… In order to mimic these properties, simulators use generic sliders which can be tuned to approach a plausible look. However these preset profiles are not based on any physical properties of the film stock itself.

In this project, our goal is to create a physically-based simulator, based on experimental measurements, for one or more film stocks. The project will therefore imply data acquisition and analysis, with film cameras and different film stocks, as well as a precise modeling of the film response.

Previsional project steps:

1- Data acquisition: The project’s first step will be to acquire film + digital images for well set scenarii. We will use the IVRL lab which offers many possibilities to acquire scenes with different lighting conditions. The acquisition of the images will depend on the analysis. It will also include developing and scanning the films appropriately. We will also have to establish a protocol to correctly acquire the images.

2 – Data Analysis: After having acquired the data, we will have to analyze the results for various properties:

Grain: grain is one of the most important visual aspects of film photography. the goal will be to analyze its distribution focusing on multiple aspects
the shape of the grains,
the mean and std of the size
the density of grain
the correlation between grain properties and signal intensity

For some of these, we will require the use of a microscope to correctly identify the grains properties. We will first do this study on gray level film, which will be simpler than color film which is a superimposition of 3 photosensitive layers covered by color filters.

Tone mapping: tone mapping is a classic step of the image signal processing pipeline, which can be changed by the users. However, for film photography the contrast response function is directly linked with the physical properties of the film stock itself. We can correctly analyze the tone mapping by using standard image targets.

Color profile: Similar to tone mapping, color profile is also dependent on the film stock. A variety of methods already exist to transfer color responses from one sensor to the other, with polynomial models fitted with least squares on RAW pairs. More modern approaches involve deep learning. Since our goal is to first tackle black and white film this is a side lead for this project. [5,6]

3 – Modeling:

For grain, different models exist which are more or less physically based. Our goal will be to include the statistical and morphological analysis results in an already existing model to better mimic grain generation. [1] [2] Proposes to model the grain rendering using the boolean model while [3] approximates it using additive white noise. We could also explore learning based approaches such as [4]

Supervision:

The supervision of this project will be done by Raphael Achddou and Gwilherm Lesné from Telecom Paris, for his expertise on grain simulation.

Prerequisites:

Python and PyTorch, image and signal processing basis.

Type of Work:

MS semester project.

80% research, 20% development

Contact:

[email protected] [email protected]

References:

[1] Newson, Alasdair et al. “A Stochastic Film Grain Model for Resolution‐Independent Rendering.” Computer Graphics Forum 36 (2017): n. pag.

[2] B. E. Bayer, “Relation Between Granularity and Density for a Random-Dot Model,” J. Opt. Soc. Am. 54, 1485-1490 (1964)

[3] Zhang, Kaixuan et al. “Film Grain Rendering and Parameter Estimation.” ACM Transactions on Graphics (TOG) 42 (2023): 1 – 14.

[4] Ameur, Zoubida et al. “Deep-Based Film Grain Removal and Synthesis.” IEEE Transactions on Image Processing 32 (2022): 5046-5059.

[5] Afifi, M., Abuolaim, A.: Semi-supervised raw-to-raw mapping. CoRR abs/2106.13883 (2021), https://arxiv.org/abs/2106.13883

[6] Rang, N.H.M., Prasad, D.K., Brown, M.S.: Raw-to-raw: Mapping between image sensor color responses. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. pp. 3398–3405 (2014). https://doi.org/10.1109/CVPR.2014.434 3,

Description:

Blind face restoration endeavors to recover high-quality facial images from low-quality counterparts with various unknown degradations, including noise, compression, and blur. Recent advancements, employing deep convolutional networks [1,2,3,4], have demonstrated remarkable progress. Nevertheless, these methods struggle when faced with extreme degradation scenarios, often arising from severe levels of distortion or large facial poses, posing a persistent challenge. Although recent approaches propose leveraging different priors, for example, 3D priors [5], geometric priors [6] or generative priors [7] to enhance restoration quality, they still exhibit artifacts in extreme cases.

Recently, diffusion models have exhibited robust capabilities in generating realistic images, and they have been used in restoration tasks such as image super resolution and image deblurring [8,9]. In this project, we aim to explore the potential of diffusion models in addressing the demanding task of extreme blind face restoration. We will try to harness the extensive prior knowledge encoded in existing pre-trained diffusion models, seeing if we can extract textural or structural information of natural facial images that might encoded within these models, and use this information to aid our facial image restoration task.

Key Questions:

Instead of training a diffusion model from scratch with a limited number of images available in benchmarks, is it possible for us to utilize the prior information encoded in diffusion models that were pre-trained on a large amount of data to aid the restoration task? How can we extract relevant information?

After extracting the relevant information, what is the best way to fuse it with the low-quality images to obtain good results?

The pre-trained diffusion models might be biased towards facial images with normal poses. How can we deal with this?

References:

[1]. Wang, Xintao, et al. “Towards real-world blind face restoration with generative facial prior.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

[2]. Wang, Zhouxia, et al. “Restoreformer: High-quality blind face restoration from undegraded key-value pairs.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.

[3]. Zhou, Shangchen, et al. “Towards robust blind face restoration with codebook lookup transformer.” Advances in Neural Information Processing Systems 35 (2022): 30599-30611.

[4]. Gu, Yuchao, et al. “Vqfr: Blind face restoration with vector-quantized dictionary and parallel decoder.” European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.

[5]. Chen, Zhengrui, et al. “Blind Face Restoration under Extreme Conditions: Leveraging 3D-2D Prior Fusion for Superior Structural and Texture Recovery.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. No. 2. 2024.

[6]. Zhu, Feida, et al. “Blind face restoration via integrating face shape and generative priors.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.

[7]. Yang, Tao, et al. “Gan prior embedded network for blind face restoration in the wild.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

[8]. Saharia, Chitwan, et al. “Image super-resolution via iterative refinement.” IEEE transactions on pattern analysis and machine intelligence 45.4 (2022): 4713-4726.

[9]. Zhu, Yuanzhi, et al. “Denoising diffusion models for plug-and-play image restoration.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

Prerequisites:

Python and PyTorch.

Type of Work:

MS semester project.

80% research, 20% development

Supervisor:

Liying Lu ([email protected])

Startup company Innoview has developed arrangements of lenslets that can be used to create document security features. The goal is to verify to which extent a software simulator like Blender is able to faithfully simulate the behavior of light interacting with such lenslets.

Deliverables: Report and running prototype (Matlab). Blender lenslet simulations.

Prerequisites:

– knowledge of image processing / computer vision

– basic coding skills in Matlab

Level: BS or MS semester project

Supervisors:

Prof. Roger D. Hersch, BC110, [email protected], cell: 077 406 27

Dr Romain Rossier, Innoview Sàrl, [email protected], , tel

078 664 36 44

Startup company Innoview has developed new moiré features that can prevent counterfeits. Some types of moiré features rely on grayscale images. The present project aims at creating a grayscale image editor. Designers should be able to shape their grayscale image by various means (interpolation between spatially defined grayscale values, geometric transformations, image warping, etc…).

Deliverables: Report and running prototype (Matlab). Blender lenslet simulations.

Prerequisites:

– knowledge of image processing / computer vision

– coding skills in Matlab

Level: BS or MS semester project

Supervisors:

Prof. Roger D. Hersch, BC110, [email protected], cell: 077 406 27

Dr Romain Rossier, Innoview Sàrl, [email protected], , tel

078 664 36 44

Startup company Innoview Sàrl has developed software to recover a message hidden into patterns. Appropriate settings of parameters enable the detection of counterfeits. The goal of the project is to define optimal parameters for different sets of printing conditions (resolution, type of paper, type printing device, complexity of hidden watermark, etc..). The project involves tests on a large data set and appropriate statistics.

Deliverables: Report and running prototype (Android, Matlab).

Prerequisites:

– knowledge of image processing / computer vision
– basic coding skills in Matlab and/or Java Android

Level: BS or MS semester project

Supervisors:

Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, BC110, [email protected], cell: 077 406 27 09

This project aims to explore whether there is any semantic information encoded by off-the-shelf diffusion model that helps us and other deep learning models understand what is the content of an image or the relationship between images.

Diffusion models [1] have been the new paradigm for generative modeling in computer vision. Despite its success, it remains to be a black box during generation. At each step, it provides a direction, namely the score, towards the data distribution. As shown in recent work [2], the score can be decomposed into different meaningful components. The first research question is: does the score encode any semantic information of the generated image?

Moreover, there is evidence that the representation learned by diffusion models is helpful to discriminative models. For example, it can boost the classification performance by knowledge distillation [3]. Furthermore, diffusion model itself can be used as a robust classifier [4]. It can be seen that discriminative information can be extracted from the diffusion model. Then the second question is: What is the information about? Is it about the object shape? Location? Texture? Or other kinds of information.

This is an exploratory project. We will try to interpret the black box in diffusion model and dig semantic information that it encodes. Together, we will also brainstorm the application of diffusion model other than image generation. This project can be a good chance for you to develop interest and skills in scientific research.

References:

[1] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in neural information processing systems, 2020, 33: 6840-6851.

[2] Alldieck T, Kolotouros N, Sminchisescu C. Score Distillation Sampling with Learned Manifold Corrective[J]. arXiv preprint arXiv:2401.05293, 2024.

[3] Yang X, Wang X. Diffusion model as representation learner[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 18938-18949.

[4] Chen H, Dong Y, Shao S, et al. Your diffusion model is secretly a certifiably robust classifier[J]. arXiv preprint arXiv:2402.02316, 2024.

Deliverables: Deliverables should include code, well cleaned up and easily reproducible, as well as a written report, explaining the models, the steps taken for the project and the results.

Prerequisites: Python and PyTorch. Basic understanding of diffusion models.

Level: MS research project

Number of students: 1

Contact: Yitao Xu, [email protected]