If you are interested in doing a research project (“semester project”) or a master’s project at IVRL, you can do this through the Master’s Programs in Data Science or in Computer Science. Note that you must be accredited to EPFL. This page lists available semester/master’s projects for the Fall 2026 semester. The order of the projects is random.
For any other type of applications (research assistantship, internship, etc), please check this page.
Startup company Innoview has developed a software framework to create hidden watermarks printed on paper and to acquire and decode them by a smartphone. The acquisition by smartphone comprises many separate parametrizable parts. The project consists in improving some of the parts of the acquisition pipeline in order to optimize the recognition rate of the hidden watermarks (under Android).
Deliverables:
- Report and running prototype.
Prerequisites:
- basic knowledge of image processing and computer vision,
- Coding skills in Java Android, C#, and/or Matlab
Level: BS or MS semester project
Supervisors:
Dr. Romain Rossier, Innoview Sàrl, [email protected], tel 078 664 36 44
Prof. Roger D. Hersch, BC320, [email protected], cell: 077 406 27 09
Startup company Innoview has developed arrangements of transparent lenslets and of opaque structures that yield interesting moiré effects.
The goal is to create plastic objects composed of a revealing layer made of transparent lenses and of a base layer made of partly opaque structures. The superposition of the two layers shows interesting moiré evolutions. Once created as 3D volumes, their aspect can be simulated in Blender. After simulation and verification, these objects are to be printed by a 3D printer.
Deliverables:
Report and running prototype (Matlab). Blender lenslet simulations. Fabricated 3D objects showing the moiré evolutions.
Prerequisites:
1. Good knowledge of computer graphics, especially the construction
of 3D mesh objects,
2. Basic knowledge of Blender,
3. Good coding skills in Matlab
Level: BS or MS semester project, master’s project
Supervisors:
Prof. Roger D. Hersch, BC320, [email protected], cell: 077 406 27 09
Dr. Romain Rossier, Innoview Sàrl, [email protected], tel 078 664 36 44
Startup company Innoview has developed arrangements of lenslets that can be used to create document security features. By simulating the interaction of light with these 3D lenslet structures one can try to improve these security features. The interaction of light with the lenslets is simulated by ray tracing, applying Snell’s law and the Fresnel equations. Possibly, these ray-tracing based simulations can be compared with the simulations obtained with the Blender software.
Deliverables:
Report and running simulation prototype (Matlab), ray-tracing based and Blender simulations for various configurations.
Prerequisites:
Knowledge of computer graphics, interaction of light with surfaces,
basic knowledge of Blender, coding skills in Matlab
Level: BS or MS semester project
Supervisors:
Prof. Roger D. Hersch, BC320, [email protected], cell: 077 406 27 09
Dr. Romain Rossier, Innoview Sàrl, [email protected], tel 078 664 36 44
Introduction:
Image relighting is the problem of modifying the illumination of a scene captured in a photograph. Recent work by Careaga and Aksoy [1] introduced a physically controllable relighting pipeline for RGB images that allows users to insert explicit light sources, such as point lights, spot lights, and environmental illumination, into the scene. Their method combines monocular geometry estimation, intrinsic image decomposition [7, 8], physically-based rendering (PBR), and neural rendering in a self-supervised framework trained on real-world photograph collections.
However, this pipeline operates exclusively in the RGB color space, which inherently limits its physical accuracy. RGB captures only three broad spectral bands, leading to ambiguities such as metamerism, where materials with different spectral reflectance curves produce identical RGB values but respond differently to changes in illumination. Multi-spectral images, which capture many narrowband channels across the visible (and potentially near-infrared) spectrum, provide a much richer description of surface reflectance and illumination. This additional spectral information can reduce material ambiguities, enable more accurate intrinsic decomposition, and support physically faithful relighting under light sources with arbitrary spectral power distributions (SPDs).
This project proposes to extend the physically controllable relighting framework of Careaga and Aksoy [1] to multi-spectral images. By operating in the spectral domain, the method can leverage richer material information for more accurate intrinsic decomposition, perform physically correct light transport simulation across all spectral bands, and produce relit multi-spectral images that faithfully capture the interaction between spectral illumination and spectral reflectance.
Methodology:
Data Preparation
- Use existing multi-spectral image datasets (e.g., KAUST reflectance dataset [5]) as the basis for training and evaluation.
- We will also collect our own multi-spectral datasets with a mobile phone equipped with a multi-spectral sensor (the lab will provide this device).
Spectral Intrinsic Decomposition
- This stage separates a multi-spectral image into spectral reflectance and shading components. Two approaches could be explored:
- Existing methods: Adopt and evaluate established multi-spectral intrinsic decomposition algorithms, such as MIID [2] (subspace-constrained Retinex) or low-rank factorization approaches [3]. These optimization-based methods offer interpretable priors (spectral smoothness, low-rank structure) and do not require training data.
- Learning-based approach: Train a neural network that takes multi-spectral input and outputs spectral reflectance and shading maps. The network can be pre-trained on synthetic spectral data (rendered with known ground truth) and optionally fine-tuned on real captures.
Geometry Estimation and Scene Reconstruction
- Estimate monocular depth from the multi-spectral input (using a pseudo-RGB conversion or a visible-band subset) with an existing monocular depth estimation model (e.g., Depth Anything [6]).
- Reconstruct a textured mesh of the scene using the estimated depth and spectral reflectance, creating a 3D representation suitable for spectral path tracing.
Spectral Physically-Based Rendering
- Allow users to define light sources with explicit spectral power distributions in the 3D scene (point lights, spot lights, area lights, environmental illumination with spectral HDR maps).
- Render the scene using Mitsuba 3’s spectral path tracing engine [4], producing an approximate multi-spectral rendering under the target illumination.
Spectral Neural Renderer
- Train a neural renderer that takes the approximate spectral PBR rendering as input and produces a photorealistic multi-spectral relighting result.
- Adapt the self-supervised training strategy of [1] to the spectral domain: use differentiable spectral rendering to reconstruct the original illuminant SPD from a multi-spectral image, generating training pairs (spectral PBR rendering, real multi-spectral image) without explicit relighting ground truth.
Training and Evaluation
- Train and evaluate the full pipeline on multi-spectral relighting tasks. Compute the peak signal-to-noise ratio (PSNR) and root mean square error (RMSE) between the original input multi-spectral image and the output from the neural renderer.
Deliverables:
- Well-documented code and trained models for the whole relighting pipeline.
- A final report detailing methodology, experiments, results, and analysis.
Type of work:
Master semester project
50% research, 50% engineering
Prerequisites:
Proficiency in coding with deep learning frameworks (e.g., PyTorch)
Familiarity with image processing and computer vision fundamentals
Basic understanding of physically-based rendering and spectral imaging
Supervisor:
Liying Lu ([email protected])
References:
[1]. Careaga, Chris, and Yağız Aksoy. “Physically Controllable Relighting of Photographs.” Proceedings of ACM SIGGRAPH. 2025.
[2]. Huang, Qian, et al. “Multispectral Image Intrinsic Decomposition via Subspace Constraint.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018.
[3]. Zheng, Yinqiang, et al. “Illumination and Reflectance Spectra Separation of a Hyperspectral Image Meets Low-Rank Matrix Factorization.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[4]. Jakob, Wenzel, et al. “Mitsuba 3 renderer.” 2022. https://mitsuba-renderer.org.
[5]. Li, Yuqi, et al. “Multispectral illumination estimation using deep unrolling network.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[6]. Lin, Haotong, et al. “Depth anything 3: Recovering the visual space from any views.” ArXiv Preprint arXiv:2511.10647 (2025).
[7]. Careaga, Chris, and Yağız Aksoy. “Colorful diffuse intrinsic image decomposition in the wild.” ACM Transactions on Graphics 43.6 (2024): 1-12.
[8]. Careaga, Chris, and Yağız Aksoy. “Intrinsic image decomposition via ordinal shading.” ACM Transactions on Graphics 43.1 (2023): 1-24.
Introduction:
Color constancy is the problem of estimating the true colors of objects in a scene under varying illumination conditions. Traditional methods rely on low-level statistics of the image, such as Gray-World or White-Patch assumptions, but these approaches often fail in complex real-world scenes. Recent deep-learning methods [1,2,3,4,5] improve performance by learning from large datasets, yet they primarily focus on pixel-level or patch-level cues, ignoring higher-level semantic information.
Semantic information can provide strong priors for color constancy [6]. For example, knowing that a region corresponds to the sky, foliage, or human skin allows the algorithm to better infer its true color regardless of illumination. Modern segmentation and detection methods, such as SAM (Segment Anything Model [7]), make it feasible to extract semantic cues from images efficiently.
This project proposes to leverage semantic information as an additional cue for color constancy. By integrating semantic segmentation maps with existing color constancy networks, the model can use object-level knowledge to guide illuminant estimation and improve color correction.
Methodology:
- Data Preparation
- Use existing color constancy datasets (e.g., LSMI [3])
- Extract semantic information using pretrained models such as SAM.
- Baseline Implementation
- Implement or reproduce a standard deep-learning-based color constancy network using only image-level features.
- Semantic Integration
- Incorporate semantic information by concatenating segmentation maps or embedding semantic features alongside image features.
- Explore attention-based mechanisms to allow the network to weigh semantic cues appropriately.
- Training and Evaluation
- Train the semantic-guided network and compare performance with baseline methods.
- Evaluate using standard metrics such as angular error and mean-squared error on illuminant estimation.
- Analysis
- Perform ablation studies to understand the contribution of semantic cues.
- Analyze cases where semantic information most improves or fails to improve performance.
Deliverables:
- Well-documented code for semantic extraction and semantic-guided color constancy.
- Trained models capable of leveraging semantic information for illuminant estimation.
- Experimental results comparing baselines.
- A final report detailing methodology, experiments, results, and analysis.
Type of work:
Master / bachelor semester project
80% research, 20% engineering
Prerequisites:
Proficiency in coding with deep learning frameworks (e.g., PyTorch)
Familiarity with image processing and computer vision fundamentals
Supervisor:
Liying Lu ([email protected])
Reference:
[1]. Afifi, Mahmoud, Marcus A. Brubaker, and Michael S. Brown. “Auto white-balance correction for mixed-illuminant scenes.” Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2022.
[2]. Kim, Dongyoung, et al. “Attentive illumination decomposition model for multi-illuminant white balancing.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
[3]. Kim, Dongyoung, et al. “Large scale multi-illuminant (lsmi) dataset for developing white balance algorithm under mixed illumination.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[4]. Barron, Jonathan T. “Convolutional color constancy.” Proceedings of the IEEE International Conference on Computer Vision. 2015.
[5]. Afifi, Mahmoud, et al. “Cross-camera convolutional color constancy.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[6]. Lindner, Albrecht, and Sabine Süsstrunk. “Semantic-improved color imaging applications: It is all about context.” IEEE Transactions on Multimedia 17.5 (2015): 700-710.
[7]. Kirillov, Alexander, et al. “Segment anything.” Proceedings of the IEEE/CVF international conference on computer vision. 2023.
Description
Our recent work, Weight Space Representation Learning via Neural Field Adaptation [1], showed that the weights of neural fields (INRs) can be treated as a first-class data modality: by adapting a shared base field to each signal, the resulting per-signal weights become structured, well-aligned, and free of much of the permutation ambiguity that plagues weight space learning [2]. This makes them amenable to encoding, comparison, and generation.
So far this has mostly been demonstrated on a limited set of modalities (e.g. images / 2D fields). The central premise of this project is that, because a neural field is just a function fθ:coordinates→values, the same weight-space machinery should apply to any signal that can be fit by an INR, regardless of its native format. The goal is to push toward a single, data-agnostic generative model that learns a prior over neural-field weights and can synthesize new fields across heterogeneous modalities.
This is an ambitious, research-heavy project building directly on [1], with three intertwined thrusts: (1) improving the core adaptation and representation method, (2) expanding to new and harder data modalities (audio, MRI / volumetric medical data, and others), and (3) using these to design and advocate for a modality-agnostic generative model of neural fields.
If executed well, this project has strong potential to result in a submission to a top-tier AI/ML conference (e.g. NeurIPS, ICML, ICLR, CVPR).
Type of work:
- MS Level: master project
- 100% Research
Goals
1. Improve the core method. Strengthen the weight-space representation pipeline from [1]. Candidate directions:
- Improve weight space structure with techniques such as asymmetric freezing [4]
2. Expand data modalities. Validate that the representation is truly signal-agnostic by fitting and learning over fields for:
- Audio (1D temporal signals / waveforms, spectrograms).
- MRI and volumetric medical data (3D scalar/vector fields), where INRs are already attractive for compression and super-resolution.
- Climate data or other underexplored data modalities
3. Build a data-agnostic generative model. Using the structured weight space as the target, design a generative prior over neural-field weights. A key enabling observation from [1] is that the per-instance weights converge to a linear mode: rather than lying on a complex manifold, each data instance is represented by a straight line in weight space (still a line rather than a single point, but far simpler than the manifolds it would otherwise occupy). This dramatically simplified geometry makes it theoretically tractable to fit a generative model directly over the line representations.
Approach
A suggested progression (the student is encouraged to adapt it):
- Familiarize: Reproduce key results of [1], study weight-space learning fundamentals [2] and the relevant INR architectures [3].
- Method improvements: Pick 1-2 concrete weaknesses of the current pipeline and address them, measuring effect on reconstruction and weight-space structure.
- New modalities: Bring up an audio and an MRI/volumetric pipeline; build datasets of fitted fields; analyze cross-modal properties.
- Generative model: Train a weight-space generative model, starting per-modality and moving toward a joint, data-agnostic model; design baselines and evaluation protocols.
Prerequisites
- Proficiency in Python and solid experience with PyTorch.
- Familiarity with neural network optimization and generative models (diffusion / flows / VAEs).
- Exposure to implicit neural representations / neural fields is a strong plus.
- Interest in weight-space learning and a willingness to work on an open-ended research problem.
Supervisor
Zhuoqian (Zack) Yang, [email protected]
References
[1] Yang, Zhuoqian, Mathieu Salzmann, and Sabine Süsstrunk. “Weight Space Representation Learning via Neural Field Adaptation.” arXiv preprint arXiv:2512.01759 (2025).
[2] Han, Xiaolong, et al. “A survey of weight space learning: Understanding, representation, and generation.” arXiv preprint arXiv:2603.10090 (2026). See also: https://github.com/Zehong-Wang/Awesome-Weight-Space-Learning
[3] Sitzmann, Vincent, et al. “Implicit neural representations with periodic activation functions (SIREN).” NeurIPS 33 (2020): 7462-7473.
[4] Zhu, Jiacheng, et al. “Asymmetry in low-rank adapters of foundation models.” arXiv preprint arXiv:2402.16842 (2024).
Description
Our previous project, HDR Reconstruction for Film Simulation, validated a key hypothesis: high dynamic range (HDR) data substantially improves film simulation, particularly the characteristic highlight roll-off and shadow detail that define an authentic analog look [4]. Recovering the dynamic range that consumer sensors clip away is therefore a prerequisite for convincing film emulation.
Since then, the literature on SDR-to-HDR reconstruction (also framed as inverse tone-mapping or exposure correction) has matured rapidly. A new generation of diffusion-based methods now lifts standard 8-bit images into HDR with impressive fidelity, for example LumaFlux [1], LeDiff [2], and LumaGuide [3]. Rather than training an HDR reconstruction model from scratch, this project asks a more practical question: how well do these off-the-shelf methods serve as the front end of a film-simulation pipeline, and how should such a pipeline be designed around them?
Type of work:
- MS Level: master project
- 50% Research, 50% Engineering
Approach
The project has two phases: a rigorous evaluation of existing SDR-to-HDR methods, followed by the design of a film-simulation pipeline built on the best-performing one.
- Benchmarking SDR-to-HDR methods: Systematically evaluate recent inverse tone-mapping / exposure correction methods (LumaFlux [1], LeDiff [2], LumaGuide [3], and other relevant baselines) on their ability to recover plausible highlight and shadow detail. Assessment combines standard HDR reconstruction metrics with criteria specific to film simulation, such as the quality of highlight roll-off and tonal smoothness after compression.
- Pipeline design: Integrate the strongest method as the HDR front end of a film-simulation pipeline, feeding the reconstructed HDR signal into a physically-grounded film simulation stage. Investigate how reconstruction artifacts propagate through the pipeline and how to mitigate them.
- Evaluation and ablation: Compare the resulting pipeline against simulation applied directly to SDR inputs, quantifying the gain from HDR reconstruction and identifying failure cases.
- Extension to video (stretch goal): If time permits, investigate temporal consistency when applying the pipeline frame-by-frame to video, and assess whether the chosen reconstruction method can be adapted for temporally stable results. We target images first; a working video pipeline would be a significant plus.
The deliverable is an evaluated, reproducible film-simulation pipeline that takes ordinary SDR captures and produces images with the nuanced highlight compression, smooth tonal transitions, and rich color gradations characteristic of analog film.
Prerequisites
- Proficiency in Python and experience with PyTorch.
- Familiarity with digital imaging pipelines, HDR formats, and tone-mapping.
- Interest in photography and knowledge of film characteristics.
Supervisor
Zhuoqian (Zack) Yang, [email protected]
References
[1] Saini, Shreshth, et al. “LumaFlux: Lifting 8-Bit Worlds to HDR Reality with Physically-Guided Diffusion Transformers.” arXiv preprint arXiv:2604.02787 (2026).
[2] Wang, Chao, et al. “LeDiff: Latent Exposure Diffusion for HDR Generation.” Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR). 2025.
[3] LumaGuide. https://shreshthsaini.github.io/LumaGuide/
[4] Attridge, G. G. “The characteristic curve.” The Journal of photographic science 39.2 (1991): 55-62.
Modern image generation systems behave as slot machines: the user submits a prompt, receives a fully-formed result, and may only accept or reroll. This stands in contrast to human–human collaboration on structurally complex artefacts such as floor plans, illustrations, and designs, where collaborators intervene, suggest, and correct mid-process, building together rather than ratifying a finished output. Current AI generation forecloses this regardless of architecture. Diffusion models expose only noisy latents that are not human-interpretable until late timesteps, while standard autoregressive models commit to a raster-scan order unrelated to how humans actually compose images.
Recent work suggests this gap is addressable at the model level rather than through post-hoc editing. Wang & Rohrmeier [1] show that, in the symbolic music domain, an unsupervised learned ordering policy recovers note-level importance rankings consistent with established reductive music analysis. In other words, learned generation orders can align with human structural intuition without supervision. In images, Ordered Autoregressive (OAR) generation [2] demonstrates that learned token orderings on CelebA-HQ naturally begin with semantically central regions before peripheral ones, while σ-GPTs [3] provide an architecture for on-the-fly user choice of generation order via dual positional encodings. Yet no published system exposes the model’s intermediate ordering decisions as a user-facing surface during generation. The closest precedent, Nested Diffusion [4], surfaces intermediate previews but not the model’s next-step policy. This convergent evidence, that structurally meaningful generation orders are learnable across modalities but not yet collaborative, motivates the present project.
Type of work: Master project
Approach
This project will develop an interactive image generation framework in which the model’s learned generation order is exposed as a collaborative interface. The central technical goal is a model that, through unsupervised training, develops its own preferred generation order reflecting the structural regularities of the domain, rather than one that is order-agnostic or that follows an externally prescribed order. The empirical anchor is tentatively set to residential floor plans (RPLAN [5], with CubiCasa5K [6] for supplementary validation), a domain where human structural ordering is canonical (boundary → functional zoning → room subdivision → openings) and where mid-process intervention is naturally valuable, since a constraint such as “kitchen near entrance” is most useful before later decisions are committed. The specific dataset and technical approach may be revised during the project as the student’s interests and preliminary findings dictate.
- Unsupervised learning of structural generation order: We will train an autoregressive model that learns its own preferred generation order from data, drawing on recent work on learned-order autoregression (OAR [2], σ-GPT [3]). At each step, the model emits a next-position policy reflecting an unsupervised internal sense of which structural element to commit to next. The model is order-flexible enough to handle off-trajectory states (necessary for user override and retraction), but is not order-agnostic: its policy should encode meaningful structural priorities learned from the domain.
- Mid-generation intervention loop: We will design an inference procedure that exposes the model’s next-position distribution at each step, supports user-driven overrides of both position and content, and supports principled retraction of the last k steps via re-masking and re-sampling.
- Order–structure alignment and qualitative evaluation: We will compare the learned generation order against canonical structural orderings in the chosen domain, and collect qualitative feedback from an informal demo session with 5–8 users.
The resulting framework should enable users to collaborate with a generative model on structurally complex visual artefacts, proposing, modifying, or retracting structural decisions mid-generation in a way currently impossible with diffusion-based or fixed-order autoregressive systems. While the project is instantiated in the image domain, the underlying principle of learned, human-communicable generation order is modality-general.
Prerequisites
- Proficiency in Python and PyTorch
- Familiarity with diffusion and autoregressive image generation
- Interest in HCI evaluation and creative AI tools
- Bonus: experience with interactive web demos (Gradio, Streamlit) and basic architectural intuition
Supervisors
Xiaoxuan Wang, [email protected]
Zhuoqian (Zack) Yang, [email protected]
References
[1] Wang, X. & Rohrmeier, M. (2025). Adaptive Path of Prediction: An Unsupervised Method for Modeling Note-Level Informational Hierarchy of Polyphony. Proc. ISMIR 2025, Daejeon, Korea, pp. 565–572.
[2] Pramanik, R., et al. (2025). Distilling Specialized Orders for Visual Generation. arXiv:2504.17069.
[3] Pannatier, A., Courdier, E. & Fleuret, F. (2024). σ-GPTs: A New Approach to Autoregressive Models. ECML-PKDD 2024.
[4] Elata, N., Kawar, B., Michaeli, T. & Elad, M. (2023). Nested Diffusion Processes for Anytime Image Generation. arXiv:2305.19066.
[5] Wu, W., Fu, X.-M., Tang, R., et al. (2019). Data-driven Interior Plan Generation for Residential Buildings. ACM Transactions on Graphics (SIGGRAPH Asia), 38(6).
[6] Kalervo, A., Ylioinas, J., Häikiö, M., Karhu, A. & Kannala, J. (2019). CubiCasa5K: A Dataset and an Improved Multi-Task Model for Floorplan Image Analysis. Scandinavian Conference on Image Analysis.