Available Projects – Spring 2026 ‒ IVRL ‐ EPFL

If you are interested in doing a research project (“semester project”) or a master’s project at IVRL, you can do this through the Master’s Programs in Data Science or in Computer Science. Note that you must be accredited to EPFL. This page lists available semester/master’s projects for the Spring 2026 semester. The order of the projects is random.

For any other type of applications (research assistantship, internship, etc), please check this page.

Description

Recent work on weight space representation learning [1] has demonstrated that multiplicative LoRA (mLoRA) weights exhibit remarkable structural properties: when combined with asymmetric masking, mLoRA weights converge to linear modes during optimization, meaning different random initializations lead to nearly identical weight configurations. This linear mode connectivity, coupled with preserved channel alignment and semantic structure, makes mLoRA an ideal candidate for meta-learning tasks that operate directly in weight space.

Traditional weight space learning faces fundamental challenges: neural network weights are ambiguous due to permutation symmetry, and functionally identical networks can be arbitrarily far apart in parameter space. The mLoRA formulation addresses these challenges by:

Constraining optimization to a structured subspace via the pre-trained base model
Preserving channel alignment through multiplicative (rather than additive) updates
Eliminating permutation symmetry via asymmetric masking

These properties suggest that mLoRA weights can serve as well-behaved representations for meta-learning tasks such as weight alignment/merging, performance prediction, and membership inference attacks, all of which benefit from structured, semantically meaningful weight spaces.

Type of work:

MS Level: master project
100% Research

Approach

This project will systematically evaluate mLoRA weight representations on meta-learning benchmarks, leveraging the discovered linear mode connectivity and semantic structure. The student should:

Familiarize: Study multiplicative LoRA formulation and weight space learning fundamentals
- Understand the difference between additive LoRA and multiplicative LoRA
- Study why multiplicative LoRA preserves channel alignment (Corollary 1 in [1])
- Review existing weight space learning literature [2] and understand key challenges (permutation symmetry, loss landscape geometry)
Task Selection and Evaluation: Choose 1-2 of meta-learning tasks to focus on, some examples are:
1. - Weight Alignment / Merging: Since mLoRA-Asym weights converge to linear modes, simple averaging should produce functional merged models. Evaluate:
    - Task arithmetic [3]: Can arithmetic operations (addition, negation) in mLoRA weight space transfer capabilities?
    - Multi-task merging: Can mLoRA weights from different tasks be merged without alignment?
    - Compare against existing alignment methods (Git Re-Basin [4], optimal transport)
  - Model Manipulation / Editing: The semantic structure of mLoRA weights suggests they could be used to manipulate or edit models in interpretable ways. Building on [5], which discovers semantic linear directions in LoRA weight space of customized diffusion models, evaluate:
    - Can we identify linear directions in mLoRA space that correspond to semantic attributes?
    - Can we edit model behavior by moving along these directions (e.g., adding/removing concepts)?
    - Does mLoRA’s improved structure yield more disentangled and interpretable editing directions compared to additive LoRA?
    - Explore weight space interpolation for smooth transitions between model behaviors
  - Membership Inference Attack (Data Usage Detection): Since mLoRA weights encode semantic structure of training data, they may leak membership information. Evaluate:
    - Train classifiers to detect whether specific samples were in the training set
    - Analyze what information is encoded in different weight components
2. Experimental Design:
  - Extend to different model types: image diffusion models, language models, classification models, etc.
  - Design appropriate baselines: additive LoRA, standalone MLP weights, latent codes encoded by various weight encoders
Prerequisites
- Proficiency in Python and experience with PyTorch
- Familiarity with neural network optimization and loss landscapes
- Understanding of Low-Rank Adaptation (LoRA) and fine-tuning methods
- Interest in weight space learning and neural network interpretability
Supervisor

Zhuoqian (Zack) Yang, [email protected]

References

[1] Yang, Zhuoqian, Mathieu Salzmann, and Sabine Süsstrunk. “Weight Space Representation Learning with Neural Fields.” arXiv preprint arXiv:2512.01759 (2025).

[2] Schürholt, Konstantin, et al. “Neural network weights as a new data modality.” ICLR 2025 Workshop Proposals (2024). See also: https://github.com/Zehong-Wang/Awesome-Weight-Space-Learning

[3] Ilharco, Gabriel, et al. “Editing models with task arithmetic.” ICLR (2023).

[4] Ainsworth, Samuel K., Jonathan Hayase, and Siddhartha Srinivasa. “Git re-basin: Merging models modulo permutation symmetries.” ICLR (2023).

[5] Dravid, Amil, et al. “Interpreting the weight space of customized diffusion models.” arXiv preprint arXiv:2406.09413 (2024).

[6] Frankle, Jonathan, et al. “Linear mode connectivity and the lottery ticket hypothesis.” ICML (2020).

[7] Lim, Soon Hoe, et al. “An Empirical Analysis on the Linear Mode Connectivity of Neural Network.” ICLR (2024).

Description

Film photography continues to thrive among enthusiasts and professionals who value its unique aesthetic qualities. While film remains popular, the process of digitizing film negatives has become increasingly challenging due to the stagnation of the consumer film scanner industry. Consumer film scanners use outdated sensors, motivating photographers to use digital cameras for scanning: a superior but technically complex alternative [1].

Color negative inversion requires specialized algorithms (as compared to standard softwares like Lightroom, PhotoShop) due to fundamental differences between film and digital sensors [2]. Current methods demand technical expertise and dedicated measuring process for film characteristics (Dmin, Dmax, characteristic curves) [3], creating barriers for amateur photographers.

Our preliminary experiments show that statistical analysis can automatically estimate essential parameters from scanned images, eliminating the need for direct density measurements while maintaining quality. The prototype software is welcomed by amateur and professional photographers.

Type of work:

Bachelor Level: Semester Project
100% Development

Approach

This project will continue to develop a toolkit using statistical methods to automatically estimate film parameters. The student will extend it with:

Multi-Image Parameter Estimation: Statistical aggregation across multiple frames for improved accuracy
Batch Processing Pipeline: Efficient whole-roll processing with frame consistency
Faster RAW Image Loading: Incorporating the rawspeed [4] repo.
OpenGL-based GUI: Responsive interface with real-time preview
Algorithm Enhancement (optional): Advanced statistical methods.

Prerequisites

Strong programming skills in Python / C++
Familiarity with version control (Git) and collaborative development
Helpful but not required:
- Experience with the analog processes
- Knowledge of computational photography and color science

Supervisor

Zhuoqian (Zack) Yang, [email protected]

References

[1] Tran, A. (2016, March 7). How to digitise film negatives using a DSLR. Ant Tran Blog. https://www.anttran.com/blog/2016/3/7/how-to-digitise-negatives-using-a-dslr

[2] Hunt, R. W. G. (1995). The reproduction of colour (5th ed.). Fountain Press.

[3] Patterson, R. (2001, October 2). Understanding Cineon. Illusion Arts. http://www.digital-intermediate.co.uk/film/pdf/Cineon.pdf

[4] darktable-org. (n.d.). RawSpeed [Computer software]. GitHub. https://github.com/darktable-org/rawspeed

Description:

Film photography’s distinctive “look” is partly due to its ability to record and compress light information of high dynamic range, especially in the highlights, without clipping [1]. By preserving subtle gradations in highlight and shadow areas and compressing them, film naturally reveals rich color nuances, which is a key contributor to its signature aesthetic.

Digital film emulation has become increasingly popular, but most applications (e.g., Dazz, Dehancer, VSCO) assume availability of high-quality captures, while working off of images captured by relatively limited consumer camera sensors. These images tend to have a low dynamic range and lose highlight and shadow detail that film retains, making it impossible for current emulators to reproduce nuanced tones via compression.

Our preliminary validation has confirmed that high dynamic range (HDR) data significantly improves the quality of film simulation, particularly in preserving the characteristic highlight roll-off and shadow detail that define authentic film aesthetics. This validation establishes the critical importance of recovering lost dynamic range information before applying film simulation techniques.

Type of work:

MS Level: master’s project
100% Research

Approach

Building on our validated hypothesis, this project will develop a deep learning framework that recovers high dynamic range RAW-equivalent images from standard RGB inputs captured by consumer-grade sensors. We will extend the state-of-the-art RAW-Diffusion model [2] by training it on carefully designed synthetic datasets that specifically target highlight and shadow reconstruction.

Our approach involves:

Synthetic Training Data Generation: Creating paired datasets of clipped RGB images and their corresponding full dynamic range RAW data, with special emphasis on highlight and shadow regions
Model Architecture Extension: Adapting RAW-Diffusion’s diffusion-based architecture to focus on reconstructing missing information in over/underexposed regions
Film Simulation Pipeline Integration: Feeding the reconstructed HDR data into physically-accurate film simulation models to achieve authentic film characteristics

The final framework should enable consumer cameras to produce images with the nuanced highlight compression, smooth tonal transitions, and rich color gradations characteristic of analog film.

Prerequisites

Proficiency in Python and experience with PyTorch.
Familiarity with digital imaging pipelines and RAW image formats.
Interest in photography and knowledge of film characteristics.

Supervisor

Zhuoqian (Zack) Yang, [email protected]

References

[1] Attridge, G. G. “The characteristic curve.” The Journal of photographic science 39.2 (1991): 55-62.

[2] Reinders, Christoph, et al. “RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation.” arXiv preprint arXiv:2411.13150 (2024).

[3] Brooks, Tim, et al. “Unprocessing images for learned raw denoising.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.

[4] Zamir, Syed Waqas, et al. “Cycleisp: Real image restoration via improved data synthesis.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.

[5] Kim, Woohyeok, et al. “Paramisp: learned forward and inverse ISPS using camera parameters.” arXiv preprint arXiv:2312.13313 (2023).

Description:

Diffusion models [1] have been the new paradigm for generative modeling in computer vision. Recent works demonstrate that these models encode rich spatial information of the input image [2,3], showing the emerging spatial reasoning capability of diffusion models without supervision. However, there is no efforts in examining the quality of the spatial information in diffusion models. Whether the spatial information of the image, e.g. the location of the object in interest, is correct, remains a question. In this project, we will first develop methods for quantifying the quality of the encoded spatial information by comparing it with human saliency data. Evidence suggests that different layers in diffusion models can differ significantly in feature quality [4]. Therefore, we will consider multiple diffusion models and methods for extracting the spatial information and perform benchmarking on them.

Building on the quality quantification method, we will pick the best model for extracting the spatial information in existing image benchmarks such as ImageNet. Such spatial information can be used as additional supervision signals in training downstream student networks for discriminative tasks. Intuitively, learning where to attend for visual perception tasks, such as image classification, can be as important as learning what to attend [5].

This is an exploratory project. We will try to interpret the black box in the diffusion model and dig spatial semantic information that it encodes. Together, we will also brainstorm the application of the diffusion model other than image generation.

References:

[1] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in neural information processing systems, 2020, 33: 6840-6851.

[2] Hertz, Amir, et al. “Prompt-to-Prompt Image Editing with Cross-Attention Control.” The Eleventh International Conference on Learning Representations.

[3] Couairon, Guillaume, et al. “DiffEdit: Diffusion-based Semantic Image Editing with Mask Guidance.” ICLR 2023 (Eleventh International Conference on Learning Representations). 2023.

[4] Meng, Benyuan, et al. “Not all diffusion model activations have been evaluated as discriminative features.” Advances in Neural Information Processing Systems 37 (2024): 55141-55177.

[5] Choi, Minkyu, et al. “A dual-stream neural network explains the functional segregation of dorsal and ventral visual pathways in human brains.” Advances in Neural Information Processing Systems 36 (2023): 50408-50428.

Deliverables: Deliverables should include code, well cleaned up and easily reproducible, as well as a written report, explaining the models, the steps taken for the project and the results.

Prerequisites: Python and PyTorch. Basic understanding of diffusion models.

Level: MS research project

Number of students: 1

Contact: Yitao Xu, [email protected]

If you are interested in this project, please send your CV and transcript to the contact person.

Diffusion and flow-matching models generate images by iteratively denoising an initial noise sample, typically sampled from pure independent white Gaussian noise. Despite their generalization ability (they seem most of the time to generate images that are new), recent scientific works [1, 2, 3, 4] have shown that these models occasionally replicate images from their training data. This is undesirable (e.g., copyright infringement, ethical/legal concerns).

Existing works have primarily focused on detecting and mitigating this memorization through indirect mechanisms. For instance, [2] relies on the magnitude of the classifier-free-guidance term used in generation for detecting memorization, [3] introduces a variant of classifier-free-guidance to reduce memorization, [4] finds out neurons with specific activations that triggers memorization in the diffusion model. These approaches do not directly analyze memorization from the core principle of diffusion/flow-matching models itself.

The core principle of diffusion and flow-matching models is to train a denoiser model, which, given a noisy image and a noise level, outputs the average of all possible clean images that could have produced this noisy image.

In this research project, we want to analyze the issue of memorization from this core principle. Specifically, the following questions are interesting:

Can we identify and fix memorization by checking properties of the denoising prediction?

Does the denoising prediction indeed look like “the average of all possible clean images that could have produced this noisy image”?

if not, is that because of memorization?

Can we modify the prediction accordingly to remove the memorization?

[1] Chen, Y., Wang, S., Zou, D., & Ma, X. (2024). Extracting training data from unconditional diffusion models. arXiv preprint arXiv:2410.02467.

[2] Wen, Y., Liu, Y., Chen, C., & Lyu, L. (2024). Detecting, explaining, and mitigating memorization in diffusion models. In The Twelfth International Conference on Learning Representations.

[3] Chen, C., Liu, D., & Xu, C. (2024). Towards memorization-free diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8425-8434).

[4] Hintersdorf, D., Struppek, L., Kersting, K., Dziedzic, A., & Boenisch, F. (2024). Finding nemo: Localizing neurons responsible for memorization in diffusion models. Advances in Neural Information Processing Systems, 37, 88236-88278.

Deliverables: Deliverables should include code, well cleaned up and easily reproducible, as well as a written report, explaining the experiments and the steps taken for the project.

Prerequisites: Python and PyTorch.

Level: Ideally MS research project (semester project), potentially BS research project (semester project)

Number of students: 1

Supervisor: Martin Nicolas Everaert (martin.everaert [at] epfl.ch)

Diffusion and flow-matching models generate images by iteratively denoising an initial noise sample, typically sampled from pure independent white Gaussian noise.

Diffusion inversion refers to the process of recovering an initial noise sample that, when passed through a diffusion model (Stable Diffusion), generates a specific desired image. Inversion is useful because it lets us map images back into the model’s latent space [space of initial noise samples], enabling applications such as image editing [slightly modifying the initial noise for controlled variation of the image] and model interpretability (understanding how the model encodes visual features in its latent space).

Most existing inversion techniques (like DDIM inversion [1]) achieve this by reversing the generation process, starting from a clean image, obtaining the model’s denoising prediction (in which direction to move to denoise the image), and move in the opposite direction. This approach assumes detailed access to the model internals, and produces only one initial noise sample.

In this project, you will implement and analyze a black-box diffusion inversion algorithm that does not rely on reversing the denoising trajectory. Instead, the diffusion model will be treated purely as a generator: we can input initial noise vectors and observe the resulting images, without access to intermediate predictions or weights.

The algorithm iteratively searches and constrains the components of the initial noise sample based on how they influence the generated image. The starting step of the algorithm would be the constraint the lowest-frequency components (ie, the average color) of the initial noise sample to generate images with the desired average color [2]. The following steps would constraint additional components, until the model generates the desired image.

Steps of the project: Literature review, implementation of the algorithm (details of the algorithm from the project supervisor), evaluation of the algorithm (failure cases, hyperparameters, etc), potential applications of the algorithm, comparison with other methods, improvement of the algorithm.

[1] Song, J., Meng, C., & Ermon, S (2021). Denoising Diffusion Implicit Models. In International Conference on Learning Representations.

[2] Everaert, M. N., Fitsios, A., Bocchio, M., Arpa, S., Süsstrunk, S., & Achanta, R. (2024). Exploiting the signal-leak bias in diffusion models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 4025-4034).

Deliverables: Deliverables should include code, well cleaned up and easily reproducible, as well as a written report, explaining the experiments and the steps taken for the project.

Prerequisites: Python and PyTorch.

Level: Ideally MS research project (semester project), potentially BS research project (semester project)

Number of students: 1

Supervisor: Martin Nicolas Everaert (martin.everaert [at] epfl.ch)

Startup company Innoview has developed a software framework to create hidden watermarks printed on paper and to acquire and decode them by a smartphone. The acquisition by smartphone comprises many separate parametrizable parts. The project consists in improving some of the parts of the acquisition pipeline in order to optimize the recognition rate of the hidden watermarks (under Android).

Deliverables:

Report and running prototype.

Prerequisites:

basic knowledge of image processing and computer vision,
Coding skills in Java Android, C#, and/or Matlab

Level: BS or MS semester project

Supervisors:

Dr. Romain Rossier, Innoview Sàrl, [email protected], tel 078 664 36 44

Prof. Roger D. Hersch, BC320, [email protected], cell: 077 406 27 09

Startup company Innoview has developed arrangements of lenslets that can be used to create document security features. The goal is to improve these security features and to optimize them by simulating the interaction of light with these 3D lenslet structures, using the Blender software.

Deliverables:

Report and running prototype (Matlab). Blender lenslet simulations.

Prerequisites:

knowledge of computer graphics, interaction of light with 3D mesh objects,
basic knowledge of Blender,
Coding skills in Matlab

Level: BS or MS semester project

Supervisors:

Prof. Roger D. Hersch, BC320, [email protected], cell: 077 406 27 09

Dr. Romain Rossier, Innoview Sàrl, [email protected], tel

078 664 36 44

Startup company Innoview has developed arrangements of transparent lenslets and of opaque structures that yield interesting moiré effects.

The goal is to create plastic objects composed of a revealing layer made of transparent lenses and of a base layer made of partly opaque structures. The superposition of the two layers shows interesting moiré evolutions. Once created as 3D volumes, their aspect can be simulated in Blender. After simulation and verification, these objects are to be printed by a 3D printer.

Deliverables:

Report and running prototype (Matlab). Blender lenslet simulations. Fabricated 3D objects showing the moiré evolutions.

Prerequisites:

1. Good knowledge of computer graphics, especially the construction
of 3D mesh objects,
2. Basic knowledge of Blender,
3. Good coding skills in Matlab

Level: BS or MS semester project, master’s project

Supervisors:

Prof. Roger D. Hersch, BC320, [email protected], cell: 077 406 27 09

Dr. Romain Rossier, Innoview Sàrl, [email protected], tel 078 664 36 44