If you are interested in doing a research project (“semester project”) or a master’s project at IVRL, you can do this through the Master Programs in Communication Systems or in Computer Science. Note you must be accredited to EPFL. This page lists available semester/master’s projects for the Fall 2025 semester. The order of the projects is random.
For any other type of applications (research assistantship, internship, etc), please check this page.
Startup company Innoview has developed a software framework to create hidden watermarks printed on paper and to acquire and decode them by a smartphone. The acquisition by smartphone comprises many separate parametrizable parts. The project consists in improving some of the parts of the acquisition pipeline in order to optimize the recognition rate of the hidden watermarks (under Android).
Deliverables:
- Report and running prototype.
Prerequisites:
- basic knowledge of image processing and computer vision,
- Coding skills in Java Android, C#, and/or Matlab
Level: BS or MS semester project
Supervisors:
Dr Romain Rossier, Innoview SĂ rl, [email protected], tel 078 664 36 44
Prof. Roger D. Hersch, BC110, [email protected], cell: 077 406 27
Startup company Innoview has developed arrangements of lenslets that can be used to create document security features. The goal is to improve these security features and to optimize them by simulating the interaction of light with these 3D lenslet structures, using the Blender software.
Deliverables:
- Report and running prototype (Matlab). Blender lenslet simulations.
Prerequisites:
- knowledge of computer graphics, interaction of light with 3D mesh objects,
- basic knowledge of Blender,
- Coding skills in Matlab
Level: BS or MS semester project
Supervisors:
Prof. Roger D. Hersch, BC110, [email protected], cell: 077 406 27
Dr Romain Rossier, Innoview SĂ rl, [email protected], tel
078 664 36 44
Startup company Innoview has developed arrangements of transparent lenslets and of opaque structures that yield interesting moiré effects.
The goal is to create plastic objects composed of a revealing layer made of transparent lenses and of a base layer made of partly opaque structures. The superposition of the two layers shows interesting moiré evolutions. Once created as 3D volumes, their aspect can be simulated in Blender. After simulation and verification, these objects are to be printed by a 3D printer.
Deliverables:
Report and running prototype (Matlab). Blender lenslet simulations. Fabricated 3D objects showing the moiré evolutions.
Prerequisites:
1. Good knowledge of computer graphics, especially the construction
of 3D mesh objects,
2. Basic knowledge of Blender,
3. Good coding skills in Matlab
Level: BS or MS semester project, master’s project
Supervisors:
Prof. Roger D. Hersch, BC110, [email protected], cell: 077 406 27
Dr Romain Rossier, Innoview SĂ rl, [email protected], tel 078 664 36 44
Introduction:
Images captured under low-light conditions often suffer from significant noise. Existing deep-learning-based denoising networks [1,2,3,4] typically require a large dataset of paired noisy-clean samples for effective training. However, collecting such paired data is both labor-intensive and time-consuming. This project aims to address this challenge by synthesizing paired noisy-clean data using generative models, such as diffusion models [5]. Noise in RAW images can be broadly classified into signal-dependent and signal-independent components. We plan to model these components separately and then combine them to simulate realistic noise for clean images.
Objective:
The primary goal of this project is to develop a robust method for noise synthesis that enables the generation of high-quality paired data. This synthesized data will be used to train denoising networks, allowing us to evaluate its impact on denoising performance.
Methodology:
- Modeling Signal-Independent Noise with Generative Models: Using available dark frame datasets, we will train a generative model (e.g., diffusion models) to learn the distribution of signal-independent noise.
- Simulating Signal-Dependent Noise with Statistical Analysis: For the signal-dependent noise component, we will use the Poisson noise model, which effectively captures the particle-like nature of light. This approach is well-supported by existing research [6,8].
- Combining Noise Components: By merging the signal-dependent and signal-independent noise components, we will synthesize realistic noisy images from clean ones, enabling us to generate an unlimited number of paired noisy-clean samples.
- Evaluating Synthetic-Noise Quality: We will train a denoising network using our synthetic data pairs, and compare its denoising performance with the denoising network trained with real paired data.
Type of work:
master semester project
65% research, 35% development
Deliverables:
- Well-documented code including signal-dependent noise analysis & modeling.
- A trainable implementation of the proposed diffusion framework that could model signal-independent noise.
- Two denoising networks that are trained by real data pairs and your own synthetic data pairs
- A final report detailing the methodology, experiments, results, and analysis
Prerequisites:
Proficiency in coding with deep learning frameworks (e.g., PyTorch)
Familiarity with image processing and computer vision fundamentals
Prior knowledge of diffusion models is advantageous
Supervisor:
Liying Lu ([email protected])
Reference:
[1]. Abdelhamed, Abdelrahman, Stephen Lin, and Michael S. Brown. “A high-quality denoising dataset for smartphone cameras.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[2]. Anaya, Josue, and Adrian Barbu. “Renoirâa dataset for real low-light image noise reduction.” Journal of Visual Communication and Image Representation 51 (2018): 144-154.
[3]. Chen, Chen, et al. “Learning to see in the dark.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[4]. Flepp, Roman, et al. “Real-World Mobile Image Denoising Dataset with Efficient Baselines.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
[5]. Ho, Jonathan, Ajay Jain, and Pieter Abbeel. “Denoising diffusion probabilistic models.” Advances in neural information processing systems 33 (2020): 6840-6851.
[6]. Wei, Kaixuan, et al. “Physics-based noise modeling for extreme low-light photography.” IEEE Transactions on Pattern Analysis and Machine Intelligence 44.11 (2021): 8520-8537.
[7]. Costantini, Roberto, and Sabine Susstrunk. “Virtual sensor design.” Sensors and Camera Systems for Scientific, Industrial, and Digital Photography Applications V. Vol. 5301. SPIE, 2004.
[8]. Zhang, Yi, et al. “Rethinking noise synthesis and modeling in raw denoising.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
Introduction:
In this project, we will develop a learning-based white balancing algorithm. White balancing [1,2,3,4,5] is the process of estimating and correcting the influence of the scene illuminant in an image to restore the perceived true colors of objects. The goal is to ensure that object colors are consistent with how they would look under a canonical white light source. This typically involves estimating the scene illuminant and applying a corresponding correction to the image.
As shown in the Color by Correlation paper [6], different illuminants produce distinct chromaticity statistics. By computing the correlation between the chromaticity distribution of an input image and known illuminant distributions, the algorithm can identify the most likely scene illuminant. However, this method relies on a manually designed, binned chromaticity space, such handcrafted representations may lack the expressiveness or discriminative power needed to accurately model the complex relationship between the image color and scene illuminants.
In this project, we aim to replace the manually designed statistic space with a learned deep feature space that is compact, discriminative, and better suited for capturing complex relationships between the image color and scene illuminant. Both the input image and candidate illuminants will be encoded into this deep space, and their similarity will be measured through correlation. The illuminant with the highest correlation score will be chosen as the estimated scene illuminant.
Objective:
The primary goal of this project is to design and implement a learning-based white balancing algorithm by constructing a deep feature space suitable for scene illuminant estimation. The deep feature space should effectively represent both the image content and the illuminant priors, enabling accurate estimation through feature correlation.
Methodology:
- Investigate the most informative input representation for feature learning â e.g., raw RGB image vs. chromaticity histogram.
- Design and train a model to construct a deep feature space from the chosen input representation. One possible approach is to use a VQ-VAE-style embedding [7,8] to learn a bank of latent codes representing illuminant priors.
- For each input image:
- Convert it into the chosen input representation.
- Encode it into the deep feature space.
- Compute correlation scores between the image feature and each illuminant code in the feature bank.
- Select the illuminant with the highest score as the estimated scene illuminant.
Type of work:
Bachelor/master semester project
65% research, 35% development
Deliverables:
- Well-documented code for input representation analysis and deep feature space construction
- A trainable implementation of the proposed deep white balancing framework
- A final report detailing the methodology, experiments, results, and analysis
Prerequisites:
Proficiency in deep learning frameworks (e.g., PyTorch)
Familiarity with image processing and computer vision fundamentals
Supervisor:
Liying Lu ([email protected])
Reference:
[1]. Barron, Jonathan T. “Convolutional color constancy.” Proceedings of the IEEE International Conference on Computer Vision. 2015.
[2]. Barron, Jonathan T., and Yun-Ta Tsai. “Fast fourier color constancy.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[3]. Afifi, Mahmoud, et al. “Cross-camera convolutional color constancy.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[4]. Afifi, Mahmoud, Marcus A. Brubaker, and Michael S. Brown. “Auto white-balance correction for mixed-illuminant scenes.” Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2022.
[5]. Kim, Dongyoung, et al. “Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
[6]. Finlayson, Graham D., Steven D. Hordley, and Paul M. Hubel. “Color by correlation: A simple, unifying framework for color constancy.” IEEE Transactions on Pattern Analysis and Machine Intelligence 23.11 (2001): 1209-1221.
[7]. Van Den Oord, Aaron, and Oriol Vinyals. “Neural discrete representation learning.” Advances in neural information processing systems 30 (2017).
[8]. Rombach, Robin, et al. “High-resolution image synthesis with latent diffusion models.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
Description:
Diffusion models [1] have been the new paradigm for generative modeling in computer vision. Recent works demonstrate that these models encode rich spatial information of the input image [2,3], showing the emerging spatial reasoning capability of diffusion models without supervision. However, there is no efforts in examining the quality of the spatial information in diffusion models. Whether the spatial information of the image, e.g. the location of the object in interest, is correct, remains a question. In this project, we will first develop methods for quantifying the quality of the encoded spatial information by comparing it with human saliency data. Evidence suggests that different layers in diffusion models can differ significantly in feature quality [4]. Therefore, we will consider multiple diffusion models and methods for extracting the spatial information and perform benchmarking on them.
Building on the quality quantification method, we will pick the best model for extracting the spatial information in existing image benchmarks such as ImageNet. Such spatial information can be used as additional supervision signals in training downstream student networks for discriminative tasks. Intuitively, learning where to attend for visual perception tasks, such as image classification, can be as important as learning what to attend [5].
This is an exploratory project. We will try to interpret the black box in the diffusion model and dig spatial semantic information that it encodes. Together, we will also brainstorm the application of the diffusion model other than image generation.
References:
[1] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in neural information processing systems, 2020, 33: 6840-6851.
[2] Hertz, Amir, et al. “Prompt-to-Prompt Image Editing with Cross-Attention Control.” The Eleventh International Conference on Learning Representations.
[3] Couairon, Guillaume, et al. “DiffEdit: Diffusion-based Semantic Image Editing with Mask Guidance.” ICLR 2023 (Eleventh International Conference on Learning Representations). 2023.
[4] Meng, Benyuan, et al. “Not all diffusion model activations have been evaluated as discriminative features.” Advances in Neural Information Processing Systems 37 (2024): 55141-55177.
[5] Choi, Minkyu, et al. “A dual-stream neural network explains the functional segregation of dorsal and ventral visual pathways in human brains.” Advances in Neural Information Processing Systems 36 (2023): 50408-50428.
Deliverables: Deliverables should include code, well cleaned up and easily reproducible, as well as a written report, explaining the models, the steps taken for the project and the results.
Prerequisites: Python and PyTorch. Basic understanding of diffusion models.
Level: MS research project
Number of students: 1
Contact: Yitao Xu, [email protected]
If you are interested in this project, please send your CV and transcript to the contact person.
Description:
Diffusion models [1] excel at image generation tasks. Certain parametrizations of the diffusion process allow inversion [2,3] to the input image. Different from performing the normal forward diffusion, the inversion retains the information of the input image along the diffusion trajectory, which supports downstream tasks such as image editing [4] or semantic information extraction from diffusion models. However, most existing approaches rely on DDIM [2] for inversion, which requires a large number of diffusion steps to achieve high-quality inversion and reconstructions. This makes the inversion process computationally expensive and limits its practicality. The goal of this project is to implement and benchmark alternative inversion processes using more efficient diffusion samplers [5] within the Hugging Face diffusers library [6]. These samplers have been shown to significantly accelerate generation while maintaining quality.
This project requires both software engineering skills as well as an understanding of diffusion models. The starting point will be the existing DDIMInverseScheduler in the diffusers library, which will serve as a baseline. Comprehensive testing on the implementation is necessary to ensure its reliability, for contributing back to the open-source library.
References:
[1] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in Neural Information Processing Systems, 2020, 33: 6840-6851.
[2] Song, Jiaming, Chenlin Meng, and Stefano Ermon. “Denoising Diffusion Implicit Models.” International Conference on Learning Representations.
[3] Zhang, Guoqiang, Jonathan P. Lewis, and W. Bastiaan Kleijn. “Exact diffusion inversion via bidirectional integration approximation.” European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024.
[4] Couairon, Guillaume, et al. “DiffEdit: Diffusion-based Semantic Image Editing with Mask Guidance.” ICLR 2023 (Eleventh International Conference on Learning Representations). 2023.
[5] Karras, Tero, et al. “Elucidating the design space of diffusion-based generative models.” Advances in Neural Information Processing Systems 35 (2022): 26565-26577.
[6] Patrick von Platen et al., Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers
Deliverables: Deliverables should include code, well cleaned up and easily reproducible, as well as a written report, explaining the code, the steps taken for the project and the results.
Prerequisites: Python and PyTorch. Basic understanding of diffusion models.
Level: BS semester project
Number of students: 1
Contact: Yitao Xu, [email protected]
If you are interested in this project, please send your CV and transcript to the contact person.
Film photography’s distinctive “look” is partly due to its ability to record and compress light information of high dynamic range, especially in the highlights, without clipping [1]. By preserving subtle gradations in highlight and shadow areas and compressing them, film naturally reveals rich color nuances, which is a key contributor to its signature aesthetic.
Digital film emulation has become increasingly popular, but most applications (e.g., Dazz, Dehancer, VSCO) assume availability of high-quality captures, while working off of images captured by relatively limited consumer camera sensors. These images tend to have a low dynamic range and lose highlight and shadow detail that film retains, making it impossible for current emulators to reproduce nuanced tones via compression.
Our preliminary validation has confirmed that high dynamic range (HDR) data significantly improves the quality of film simulation, particularly in preserving the characteristic highlight roll-off and shadow detail that define authentic film aesthetics. This validation establishes the critical importance of recovering lost dynamic range information before applying film simulation techniques.
Type of work:
- MS Level: master’s project
- 100% Research
Approach
Building on our validated hypothesis, this project will develop a deep learning framework that recovers high dynamic range RAW-equivalent images from standard RGB inputs captured by consumer-grade sensors. We will extend the state-of-the-art RAW-Diffusion model [2] by training it on carefully designed synthetic datasets that specifically target highlight and shadow reconstruction.
Our approach involves:
- Synthetic Training Data Generation: Creating paired datasets of clipped RGB images and their corresponding full dynamic range RAW data, with special emphasis on highlight and shadow regions
- Model Architecture Extension: Adapting RAW-Diffusion’s diffusion-based architecture to focus on reconstructing missing information in over/underexposed regions
- Film Simulation Pipeline Integration: Feeding the reconstructed HDR data into physically-accurate film simulation models to achieve authentic film characteristics
The final framework should enable consumer cameras to produce images with the nuanced highlight compression, smooth tonal transitions, and rich color gradations characteristic of analog film.
Prerequisites
- Proficiency in Python and experience with PyTorch.
- Familiarity with digital imaging pipelines and RAW image formats.
- Interest in photography and knowledge of film characteristics.
Supervisor
Zhuoqian (Zack) Yang, [email protected]
References
[1] Attridge, G. G. “The characteristic curve.” The Journal of photographic science 39.2 (1991): 55-62.
[2] Reinders, Christoph, et al. âRAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation.â arXiv preprint arXiv:2411.13150 (2024).
[3] Brooks, Tim, et al. “Unprocessing images for learned raw denoising.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.
[4] Zamir, Syed Waqas, et al. “Cycleisp: Real image restoration via improved data synthesis.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
[5] Kim, Woohyeok, et al. “Paramisp: learned forward and inverse ISPS using camera parameters.” arXiv preprint arXiv:2312.13313 (2023).
Description
Film photography continues to thrive among enthusiasts and professionals who value its unique aesthetic qualities. While film remains popular, the process of digitizing film negatives has become increasingly challenging due to the stagnation of the consumer film scanner industry. Consumer film scanners use outdated sensors, motivating photographers to use digital cameras for scanning: a superior but technically complex alternative [1].
Color negative inversion requires specialized algorithms (as compared to standard softwares like Lightroom, PhotoShop) due to fundamental differences between film and digital sensors [2]. Current methods demand technical expertise and dedicated measuring process for film characteristics (Dmin, Dmax, characteristic curves) [3], creating barriers for amateur photographers.
Our preliminary experiments show that statistical analysis can automatically estimate essential parameters from scanned images, eliminating the need for direct density measurements while maintaining quality. The prototype software is welcomed by amateur and professional photographers.
Type of work:
- Bachelor Level: Semester Project
- 100% Development
Approach
This project will continue to develop a toolkit using statistical methods to automatically estimate film parameters. The student will extend it with:
- Multi-Image Parameter Estimation: Statistical aggregation across multiple frames for improved accuracy
- Batch Processing Pipeline: Efficient whole-roll processing with frame consistency
- Faster RAW Image Loading: Incorporating the rawspeed [4] repo.
- OpenGL-based GUI: Responsive interface with real-time preview
- Algorithm Enhancement (optional): Advanced statistical methods.
Prerequisites
- Strong programming skills in Python / C++
- Familiarity with version control (Git) and collaborative development
- Helpful but not required:
- Experience with the analog processes
- Knowledge of computational photography and color science
Supervisor
Zhuoqian (Zack) Yang, [email protected]
References
[1] Tran, A. (2016, March 7). How to digitise film negatives using a DSLR. Ant Tran Blog. https://www.anttran.com/blog/2016/3/7/how-to-digitise-negatives-using-a-dslr
[2] Hunt, R. W. G. (1995). The reproduction of colour (5th ed.). Fountain Press.
[3] Patterson, R. (2001, October 2). Understanding Cineon. Illusion Arts. http://www.digital-intermediate.co.uk/film/pdf/Cineon.pdf
[4] darktable-org. (n.d.). RawSpeed [Computer software]. GitHub. https://github.com/darktable-org/rawspeed